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Description 

Background of the Invention 

5 The present invention relates to novel thermostable DNA polymerases, the genes and vectors encoding them and 

their use in DNA sequencing. 

US Patents 4,889,818 and 5,079,352 describe the isolation and expression of a DNA polymerase known as Taq 
DNA Polymerase (hereinafter referred to as Taq). It is reported that amino-terminal deletions wherein approximately 
one-third of the coding sequence is absent have resulted in producing a gene product that is quite active in polymerase 
io assays. Taq is described as being of use in PCR (polymerase chain reaction). 
US Patent No. 5,075,216 describes the use of Taq in DNA sequencing. 

International patent application WO 92/06/06188 describes a DNA polymerase having an identical amino acid 
sequence to Taq except that it lacks the N-terminal 235 amino acids of Taq and its use in sequencing. This DNA 
polymerase is known as A Taq. 
?5 US Patent 4,795,699 describes the use of T7 type DNA polymerases (T7) in DNA sequencing. These are of great 

use in DNA sequencing in that they incorporate dideoxy nucleoside triphosphates (NTPs) with an efficiency comparable 
to the incorporation of deoxy NTPs; other polymerases incorporate dideoxy NTPs far less efficiently which requires 
comparatively large quantities of these to be present in sequencing reactions. 

At the DOE Contractor-Grantee Workshop (Nov. 13-17, 1994, Santa Fe) and the I. Robert Lehman Symposium 
20 (Nov 11-14, 1994, Sonoma), Prof. S. Tabor identified a site in DNA polymerases that can be modified to incorporate 
dideoxy NTPs more efficiently. He reported that the presence or absence of a single hydroxy group (tyrosine vs. phe- 
nylalanine) at a highly conserved position on £ coli, DNA Polymerase I, T7, and Taq makes more than a 1000-fold 
difference in their ability to discriminate against dideoxy NTPs. (See also European Patent Application 94203433.1 
published May 31 , 1995, Publication No. 0 655 506 A1 and hereby incorporated by reference herein.) 

25 

Summary of the Invention 

The present invention provides a DNA polymerase having an amino acid sequence differentiated from Taq in that 
it lacks the N-terminal 272 amino acids and has the phenylalanine at position 667 (of native Taq) replaced by tyrosine. 
30 Preferably, the DNA polymerase has methionine at position 1 (equivalent to position 272 of Taq) (hereinafter referred 
to as FY2) The full DNA sequence is given as Fig 1 (SEQ. ID. NO. 1 ). Included within the scope of the present invention 
are DNA polymerases having substantially identical amino acid sequences to the above which retain thermostability 
and efficient incorporation of dideoxy NTPs. 

By a substantially identical amino acid sequence is meant a sequence which contains 540 to 582 amino acids that 
35 may have conservative amino acid changes compared with Taq which do not significantly influence thermostability or 
nucleotide incorporation, i.e. other than the phenylalanine to tyrosine conversion. Such changes include substitution 
of like charged amino acids for one another, or amino acids with small side chains for other small side chains, ejj. , ala 
for val. More drastic changes may be introduced at noncritical regions where little or no effect on polymerase activity 
. is observed by such a change. 
40 The invention also features DNA polymerases that lack between 251 and 293 (preferably 271 or 272) of the N- 

terminal amino acids of Thermus flavus (Tfl) and have the phenylalanine at position 666 (of native Tfl) replaced by 
tyrosine; and those that lack between 253 and 295 (preferably 274) of the N-terminal amino acids of Thermus ther- 
mophilus (Tth) and have the phenylalanine at position 669 (of native Tth) replaced by tyrosine. 

By efficient incorporation of dideoxy NTPs is meant the ability of a polymerase to show little, if any, discrimination 
45 jn the incorporation of ddNTPs when compared with dNTPs. Suitably efficient discrimination is less than 1:10 and 
preferably less than 1:5. Such discrimination can be measured by procedures known in the art. 

One preferred substantially identical amino acid sequence to that given above is that which contains 562 amino 
acids having methionine at position 1 and alanine at position 2 (corresponding to positions 271 and 272 of native Taq) 
(hereinafter referred to as FY3). A full DNA sequence is given as Fig. 2. This is a preferred DNA polymerase for 
so expression by a gene of the present invention. 

The purified DNA polymerases FY2 and FY3 both give a single polypeptide band on SDS polyacrylamide gels, 
unlike A Taq, having either a phenylalanine or tyrosine at position 667 which forms several polypeptide bands of similar 
size on SDS polyacrylamide gels. 

A second preferred substantially identical amino acid sequence is that which lacks 274 of the N-terminal amino 
ss acids of Thermus thermophilus having methionine at position 1 , and the phenylalanine to tyrosine mutation at position 
396 (corresponding to position 669 of native Tth) (hereinafter referred to as FY4). A full DNA sequence is given as Fig. 
5 (SEQ. ID. NO. 14). 

The present invention also provides a gene encoding a DNA polymerase of the present invention. In order to assist 
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in the expression of the DNA polymerase activity, the modified gene preferably codes for a methionine residue at 
position 1 of the new DNA polymerase. In addition, in one preferred embodiment of the invention, the modified gene 
also codes for an alanine at position 2 (corresponding to position 272 of native Taq). 

in a further aspect, the present invention provides a vector containing the gene encoding the DNA polymerase 

5 activity of the present invention, e^, encoding an amino acid sequence differentiated from native Taq in that it lacks 
the N-terminal 272 amino acids and has phenylalanine at position 396 (equivalent to position 667 of Taq) replaced by 
tyrosine or a substantially identical amino acid sequence thereto. 

In a yet further aspect, the present invention provides a host cell comprising a vector containing the gene encoding 
the DNA polymerase activity of the present invention, e^fl., encoding an amino acid sequence differentiated from native 

io Taq in that it lacks the N-terminal 272 amino acids and has phenylalanine at position 396 (equivalent to position 667 
of native Taq) replaced by tyrosine or a substantially identical amino acid sequence thereto. 

The DNA polymerases of the present invention are preferably in a purified form. By purified form is meant that the 
DNA polymerase is isolated from a majority of host cell proteins normally associated with it; preferably the polymerase 
is at least 10% (w/w) of the protein of a preparation, even more preferably it is provided as a homogeneous preparation, 

15 e.g. , a homogeneous solution. Preferably the DNA polymerase is a single polypeptide on an SDS polyacrylamide gel. 

The DNA polymerases of the present invention are suitably used in sequencing, preferably in combination with a 
pyrophosphatase. Accordingly, the present invention provides a composition which comprises a DNA polymerase of 
the present invention in combination with a pyrophosphatase, preferably a thermostable pyrophosphatase such as 
Thermoplasma acidophilum pyrophosphatase. (Schafer, G. and Richter, O.H. (1992) Eur. J. Biochem. 209 , 351-355). 

20 The DNA polymerases of the present invention can be constructed using standard techniques. By way of example, 

mutagenic PCR primers can be designed to incorporate the desired Phe to Tyr amino acid change (FY mutation) in 
one primer. In our hands these primers also carried restriction sites that are found internally in the sequence of the Taq 
polymerase gene clone of Delta Taq, pWB253, which was used by us as template DNA. However, the same PCR 
product can be generated with this primer pair from any clone of Taq or with genomic DNA isolated directly from 

25 Thermus aquaticus. The PCR product encoding only part of the gene is then digested with the appropriate restriction 
enzymes and used as a replacement sequence for the clone of Delta Taq digested with the same restriction enzymes. 
In our hands the resulting plasmid was designated pWB253Y. The presence of the mutation can be verified by DNA 
sequencing of the amplified region of the gene. 

Further primers can be prepared that encode for a methionine residue at the N-terminus that is not found at the 

30 corresponding position of Taq, the sequence continuing with amino acid residue 273. These primers can be used with 
a suitable plasmid, e.g. , pWB253Y DNA, as a template for amplification and the amplified gene inserted into a vector, 
e.g. , pRE2, to create a gene, e^, pRE273Y, encoding the polymerase (FY2). The entire gene can be verified by DNA 
sequencing. 

Improved expression of the DNA polymerases of the present invention in the pRE2 expression vector was obtained 
35 by creating further genes, pREFY2pref (encoding a protein identical to FY2) and pREFY3 encoding FY3. A mutagenic 
PCR primer was used to introduce silent codon changes (i.e., the amino acid encoded is not changed) at the amino 
terminus of the protein which did not affect the sequence of the polypeptide. These changes led to increased production 
of FY2 polymerase. FY3 was designed to promote increased translation efficiency in vivo. In addition to the silent codon 
changes introduced in pREFY2pref, a GCT codon was added in the second position (SEQ. ID. NO. 2), as occurs 
40 frequently in strongly expressed genes in E. coll This adds an amino acid to the sequence of FY2, and hence the 
protein was given its own designation FY3. Both constructs produce more enzyme than pRE273Y. 

Silent codon changes such as the following increase protein production in E. coli: 
substitution of the codon GAG for GAA; 

substitution of the codon AGG, AG A, CGG or CGA for CGT or CGC; 
45 substitution of the codon CTT, CTC, CTA, TTG or TTA for CTG; substitution of the codon ATA for ATT or ATC; 

substitution of the codon GGG or GG A for GGT or GGC. 

The present invention also provides a method for determining the nucleotide base sequence of a DNA molecule. 

The method includes providing a DNA molecule annealed with a primer molecule able to hybridize to the DNA molecule; 

and incubating the annealed molecules in a vessel containing at least one deoxynucleotide triphosphate, and a DNA 
50 polymerase of the present invention. Also provided is at least one DNA synthesis terminating agent which terminates 

DNA synthesis at a specific nucleotide base. The method further includes separating the DNA products of the incubating 

reaction according to size, whereby at least a part of the nucleotide base sequence of the DNA molecule can be 

determined. 

In preferred embodiments, the sequencing is performed at a temperature above 50°C, 60°C, or 70°C. 
55 in other preferred embodiments, the DNA polymerase has less than 1000, 250, 100, 50, 10 or even 2 units of 

exonuclease activity per mg of polymerase (measured by standard procedure, see betow) and is able to utilize primers 
having only 4, 6 or 10 bases; and the concentration of all four deoxynucleoside triphosphates at the start of the incu- 
bating step is sufficient to allow DNA synthesis to continue until terminated by the agent, ejj., a ddNTP. 
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For cycle sequencing, the DNA polymerases of the present invention make it possible to use significantly tower 
amounts of dkJeoxynucleotides compared to naturally occurring enzymes. That is, the method involves providing an 
excess amount of deoxynucleotides to all four dideoxynucleotides in a cycle sequencing reaction, and performing the 
cycle sequencing reaction. 

5 Preferably, more than 2, 5, 10 or even 100 fold excess of a dNTP is provided to the corresponding ddNTP. 

In a related aspect, the invention features a kit or solution for DNA sequencing including a DNA polymerase of the 
present invention and a reagent necessary for the sequencing such as dlTP, deaza GTP, a chain terminating agent 
such as a ddNTP, and a manganese-containing solution or powder and optionally a pyrophosphatase. 

In another aspect, the invention features a method for providing a DNA polymerase of the present invention by 
io providing a nucleic acid sequence encoding the modified DNA polymerase, expressing the nucleic acid within a host 
cell, and purifying the DNA polymerase from the host cell. 

In another related aspect, the invention features a method for sequencing a strand of DNA essentially as described 
above with one or more (preferably 2, 3 or 4) deoxyribonucleoside triphosphates, a DNA polymerase of the present 
invention, and a first chain terminating agent. The DNA polymerase causes the primer to be elongated to form a first 
1S series of first DNA products differing in the length of the elongated primer, each first DNA product having a chain 
terminating agent at its elongated end, and the number of molecules of each first DNA products being approximately 
the same for substantially all DNA products differing in length by no more than 20 bases. The method also features 
providing a second chain terminating agent in the hybridized mixture at a concentration different from the first chain 
terminating agent, wherein the DNA polymerase causes production of a second series of second DNA products differing 
20 in the length of the elongated primer, with each second DNA product having the second chain terminating agent at its 
elongated end. The number of molecules of each second DNA product is approximately the same for substantially all 
second DNA products differing in length from each other by from 1 to 20 bases, and is distinctly different from the 
number of molecules of all the first DNA products having a length differing by no more than 20 bases from that of said 
second DNA products. 

25 In preferred embodiments, three or four such chain terminating agents can be used to make different products 

and the sequence reaction is provided with a magnesium ion, or even a manganese or iron ion (e.g. , at a concentration 
between 0.05 and 100 mM, preferably between 1 and 10 mM); and the DNA products are separated according to 
molecular weight in four or less lanes of a gel. 

In another related aspect, the invention features a method for sequencing a nucleic acid by combining an oligo- 

30 nucleotide primer, a nucleic acid to be sequenced, between one and four deoxyribonucleoside triphosphates, a DNA 
polymerase of the present invention, and at least two chain terminating agents in different amounts, under conditions 
favoring extension of the oligonucleotide primer to form nucleic acid fragments complementary to the nucleic acid to 
be sequenced. For example, the chain terminating agent may be a dideoxynucleotide terminator for adenine, guanine, 
cytosine or thymine. The method further includes separating the nucleic acid fragments by size and determining the 

35 nucleic acid sequence. The agents are differentiated from each other by intensity of a label in the primer extension 
products. 

While it is common to use gel electrophoresis to separate DNA products of a DNA sequencing reaction, those in 
the art will recognize that other methods may also be used. Thus, it is possible to detect each of the different fragments 
. using procedures such as time of flight mass spectrometry, electron microscopy, and single molecule detection meth- 
40 ods. 

The invention also features an automated DNA sequencing apparatus having a reactor including reagents which 
provide at least two series of DNA products formed from a single primer and a DNA strand. Each DNA product of a 
series differs in molecular weight and has a chain terminating agent at one end. The reagents include a DNA polymerase 
of the present invention. The apparatus includes a separating means for separating the DNA product along one axis 

45 of the separator to form a series of bands. It also includes a band reading means for determining the position and 
intensity of each band after separation along the axis, and a computing means that determines the DNA sequence of 
the DNA strand solely from the position and intensity of the bands along the axis and not from the wavelength of 
emission of light from any label that may be present in the separating means. 

Other features and advantages of the invention will be apparent from the following description of the preferred 

50 embodiments thereof, and from the claims. 

Description of the Preferred Embodiments 
The drawings will first briefly be described. 

55 

Drawings 

Figs 1-4 are the DNA sequences, and corresponding amino acid sequences, of FY2, FY3, and the DNA polymer- 
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ases of T flavus and Thermus thermophilus . respectively. Figure 5 is the DNA sequence and corresponding amino 
acid sequence of FY4. 

Examples 

5 

The following examples serve to illustrate the DNA polymerases of the present invention and their use in sequenc- 
ing. 

Preparation of FY DNA Polymerases (FY2 and FY3) 

10 

Bacterial Strains 

E. coli strains: MV1190 [A(srl - recA) 306: :Tn 10, A (lac-proAB), tht] supE, F (traD36 proAB* lacfi fecZ AM15)]; 
OHX+ [gyrA96, recA1, relA1, endA1, thi-1, hsdR17, supE44, \+\, M5248 (k(bio275, c\857, C///+, A/+, A (H1))]. 

15 

PCR 

Reaction conditions based on the procedure of Barnes (91 Proc. Nat'l. Acad. Sci. 221 6-2220, 1 994) were as follows: 
20mM Tricine pHB.8, 85mM KOAc, 200mM dNTPs, 10% glycerol, 5% DMSO, 0.5mM each primer, 1 .5mM MgOAc, 2.5 
20 u HotTub (Amersham Life Science Inc.) , 0.025 U Deep Vent (New England Biolabs), 1-100 ng target DNA per 100ml 
reaction. Cycling conditions were 94°C 30s, 68°C 1 0m40s for 8 cycles; then 94°C 30s, 68°C 1 2m00s for 8 cycles; then 
94°C 30s, 68°C13m20s for 8 cycles; then 94°C 30s, 68°C 14m40s for 8 cycles. 

In vitro mutagenesis 

25 

Restriction enzyme digestions, plasmid preparations, and other in vitro manipulations of DNA were performed 
using standard protocols (Sambrook et al., Molecular Cloning 2nd Ed. Cold Spring Harbor Press, 1989). PCR (see 
protocol above) was used to introduce a Phe to Tyr amino acid change at codon 667 of native Taq DNA polymerase 
(which is codon 396 of FY2). Oligonucleotide primer 1 dGCTTGGGCAGAGGATCCGCCGGG (SEQ. ID. NO. 3) spans 

30 nucleotides 954 to 976 of the coding region of SEQ. ID. NO. 1 including a BamHI restriction site. Mutagenic oligo primer 
2 dGGGATGGCTAGCTCCTGGGAGAGGCGGTGGGCCGACATGCCGTAGA GGACCCCGTAGTTGATGG (SEQ. ID. 
NO. 4) spans nucleotides 11 78 to 1241 including an Nhel site and codon 396 of Sequence ID. NO. 1. A clone of exo- 
Taq deleted for the first 235 amino acids, pWB253 encoding DeltaTaq polymerase (Barnes, 112 Gene 29-35, 1992) 
was used as template DNA. Any clone of Taq polymerase or genomic DNA from Thermus aquaticus could also be 

35 utilized to amplify the identical PCR product. The PCR product was digested with BamHI and Nhel, and this fragment 
was ligated to BamHI/Nhel digested pWB253 plasmid to replace the corresponding fragment to create pWB253Y, 
encoding polymerase FY1. Cells of E. coli strain MV1190 were used for transformation and induction of protein ex- 
pression, although any host strain carrying a lac repressor could be substituted. DNA sequencing verified the Phe to 
Tyr change in the coding region. 

40 PCR primer 3 dGGAATTCCATATGGACGATCTGAAGCTCTCC (SEQ. ID. NO. 5) spanning the start codon and 

containing restriction enzyme sites, was used with PCR primer 4 dGGGGTACCAAGCTTCACTCCTTGGCGGAGAG 
(SEQ. ID. NO. 6) containing restriction sites and spanning the stop codon (codon 562 of Sequence ID. NO. 1). A 
methionine start codon and restriction enzyme recognition sequences were added to PCR primer 5 dGGAATTCCAT- 
ATGCTGGAGAGGCTTGAGTTT (SEQ. ID. NO. 7), which was used with primer 4 above. PCR was performed using 

45 the above primer pairs, and plasmid pWB253Y as template. The PCR products were digested with restriction enzymes 
Ndel and Kpnl and ligated to Ndel/Kpnl digested vector pRE2 (Reddi et al., 17 Nucleic Acids Research 10,473-10,488, 
1 989) to make plasmids pRE236Y, encoding FY1 polymerase, and pRE273Y encoding FY2 polymerase, respectively. 
Cells of E. coli strain DHA+ were used for primary transformation with this and all subsequent pRE2 constructions, and 
strain M5248 (Xcl857) was used for protein expression, although any comparable pair of E. coli strains carrying the 

so cl + and cl857 alleles could be utilized. Alternatively, any red- cl + strain could be induced by chemical agents such as 
nalidixic acid to produce the polymerase. The sequences of both genes were verified. pRE273Y was found to produce 
a single polypeptide band on SDS polyacrylamide gels, unlike pRE253Y or pRE236Y 

Primer 6 dGGAATTCCATATGCTGG AACGTCTGGAGTTTGGCAGCCTC CTC (SEQ. ID. NO. 8) and primer 4 were 
used to make a PCR product introducing silent changes in codon usage of FY2. The product was digested with Ndel/ 

55 BamHI and ligated to a pRE2 construct containing the 3' end of FY2 to create pREFY2pref, encoding FY2 DNA polymer- 
ase. Primer 7 dGGAATTCCATATGGCTCTGGAACGTCTGGAGTTTGGCAGCCTCCTC (SEQ. ID. NO. 9) and primer 
4 were used to make a PCR product introducing an additional alanine codon commonly occurring at the second position 
of highly expressed genes. The Ndel/BamHI digested fragment was used as above to create pREFY3, encoding FY3 
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DNA polymerase. 

Preparation of FY4 DNA Polymerase 
5 Bacterial Strains 

£ coli strains: DH1X+ [gyrA96, recA1, r&IA1, endA1, thi-1, hsdR17, supE44, X* J; M5248 [X (bio275, c!857, clll+, 
A/+, A (H1))). 

10 PCR 

Genomic DNA was prepared by standard techniques from Thermus thermophilus. The DNA polymerase gene of 
Thermus thermophilus is known to reside on a 3 kilobase AlwNI fragment. To enrich for polymerase sequences in some 
PCR reactions, the genomic DNA was digested prior to PCR with AlwNI, and fragments of approximately 3 kb were 
is selected by agarose gel electrophoresis to be used as template DNA. Reaction conditions were as follows: 10mM Tris 
pH8.3, 50mM KCI, SOOjiM dNTPs, 0.001% gelatin, 1.0u,M each primer, 1.5mM MgCl 2 , 2.5 U Tth, 0.025 U Deepvent 
(New England Biolabs), per 100uJ reaction. Cycling conditions were 94°C 2 min, then 35 cycles of 94°C 30s, 55°C 
30s, 72°C 3 min, followed by 72 °C for 7 min. 

20 in vitro mutagenesis 

Restriction enzyme digestions, plasmid preparations, and other in vitro manipulations of DNA were performed 
using standard protocols (Sambrook et al., 1989). Plasmid pMR1 was constructed to encode an exonuclease-free 
polymerase, with optimized codons for expression in £ coli at the 5* end. Primer 8 (SEQ. ID. NO. 10) (GGAATTCCAT- 
25 ATGCTGGAACGTCTGGAATTCGGCAGCCTC) was used with Primer 9 (SEQ. ID. NO.1 1 ) (GGGGTACCCTAACCCTT- 
GGCGGAAAGCCAGTC) to create a PCR product from Tth genomic DNA, which was digested with restriction enzymes 
Ndel and Kpnl and inserted into plasmid pRE2 (Reddi et al., 1 989, Nucleic Acids Research 1 7, 1 0473 - 10488) digested 
with the same enzymes. 

To create the desired F396Y mutation, two PCR products were made from Tth chromosomal DNA. Primer 8 above 
30 was used in combination with Primer 10 (SEQ. ID. NO. 12) (GGGATGGCTAGCTCCTGGGAGAGCCTAT- 
GGGCGG ACAT GCCGTAGAGG ACGCCGTAGTTCACCG) to create a portion of the gene containing the F to Y amino 
acid change as well as a silent change to create an Nhel restriction site. Primer 11 (SEQ. ID. NO. 13) 
(CTAGCTAGCCATCCCCTA CGAAGAAGCGGTGGCCT) was used in combination with primer 9 above to create a 
portion of the gene from the introduced Nhel site to the stop codon at the 3* end of the coding sequence. The PCR 
35 product of Primers 8 and 10 was digested with Ndel and Nhel, and the PCR product of Primers 9 and 11 was digested 
with Nhel and Kpnl. These were introduced into expression vector pRE2 which was digested with Ndel and Kpnl to 
produce plasmid pMR5. In addition to the desired changes, pMR5 was found to have a spurious change introduced 
by PCR, which led to an amino acid substitution, K234R. Plasmid pMR8 was created to eliminate this substitution, by 
. replacing the Afll l/BamHI fragment of pMR5 for the corresponding fragment from pMR1 . The FY4 polymerase encoded 
40 by plasmid pMR8 (SEQ. ID. NO. 14) is given in Figure 5. 

Cells of E. coli strain DH1X+ were used for primary transformation, and strain M5248 (Xcl857) was used for protein 
expression, although any comparable pair of £ coli strains carrying the cl + and cl857 alleles could be utilized. Alter- 
natively, any rec+ ct + strain could be induced by chemical agents such as nalidixic acid to produce the polymerase. 

45 Protein Sequencing 

Determinations of amino terminal protein sequences were performed at the W.M. Keck Foundation, Biotechnology 
Resource Laboratory, New Haven, Connecticut. 

50 Purification of Polymerases 

A 1 liter culture of 2X LB (2% Bacto-Tryptone, 1% Bacto- Yeast Extract, 0.5% NaCI) + 0.2% Casamino Acids + 20 
mM KP0 4 pH 7.5 + 50 u.g/ml Ampicillin was inoculated with a glycerol stock of the appropriate cell strain and grown 
at 30°C with agitation until cells were in log phase (0.7-1.0 OD 590 ). 9 liters of 2X LB + 0.2% Casamino Acids + 20 mM 
55 KP0 4 pH 7.5 + 0.05% Mazu Anti-foam was inoculated with 1 liter of log phase cells in 10 liter Microform Fermentors 
(New Brunswick Scientific Co.). Cells were grown at 30°C under 15 psi pressure, 350-450 rpm agitation, and an air 
flow rate of 14,000 cc/min ±1000 cc/min. When the OD 690 reached 1.5-2.0, the cultures were induced by increasing 
the temperature to 40-42°C for 90-120 minutes. The cultures were then cooled to < 20°C and the cells harvested by 
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centritugation In a Sorvall RC-3B centrifuge at 5000 rpm at 4°C for 1 5-20 minutes. Harvested cells were stored at -80°C. 

Frozen cells were broken into small pieces and resuspended in pre-warmed (90-95°C) Lysis Buffer (20 mM Tris 
pH 8.5, 1 mM EDTA, 10 mM MgClg, 16 mM (NH 4 ) 2 S0 4 , 0.1% Tween 20, 0.1% Nonidet P^0, 1 mM PMSF). Resus- 
pended cells were then heated rapidly to 80°C and incubated at 80°C for 20 minutes with constant stirring. The sus- 

5 pension was then rapidly cooled on ice. The cell debris was removed by centritugation using a Sorvall GSA rotor at 
10,000 rpm for 20 minutes at 4°C. The NaCI concentration of the supernatant was adjusted to 300 mM. The sample 
was then passed through a diethylaminoethyl cellulose (Whatman DE-52) column that had been previously equilibrated 
with Buffer A (20 mM Tris pH 8.5, 1 mM EDTA, 0.1% Tween 20, 0.1% Nonidet P-40, 300 mM NaCI, 10% glycerol, 1 
mM DTT), and polymerase collected in the flow through. The sample was then diluted to a concentration of NaCI of 

io 1 0OmM and applied to a Heparin-sepharose column. The polymerase was eluted from the column with a NaCI gradient 
(100-500 mM NaCI). The sample was then dialyzed against Buffer B (20 mM Tris pH 8.5, 1 mM EDTA, 0.1% Tween 
20, 0.1% Nonidet P-40, 10 mM KCI, 10% glycerol, 1 mM DTT) and further diluted as needed to lower the conductivity 
of the sample to the conductivity of Buffer B. The sample was then applied to a diethylaminoethyl (Waters DEAE 15 
HR) column and eluted with a 10-500 mM KCI gradient. The polymerase was then diluted with an equal volume of 

is Final Buffer (20 mM Tris pH 8.5, 0.1 mM EDTA, 0.5% Tween 20, 0.5% Nonidet P-40, 100 mM KCI, 50% glycerol, 1 
mM DTT) and dialyzed against Final Buffer. 

Assay of Exonuc lease Activity 

20 The exonuclease assay was performed by incubating 5 ul (25-150 units) of DNA polymerase with 5 ug of labelled 

[ 3 H]-pBR322 PCR fragment (1 .6x1 0 4 cpm/ug DNA) in 100 ul of reaction buffer of 20 mM Tris-HCI pH 8.5, 5 mM MgCI 2 , 
10 mM KCI, for 1 hour at 60 °C. After this time interval, 200 ul of 1:1 ratio of 50 ug/ml salmon sperm DNA with 2 mM 
EDTA and 20% TCA with 2% sodium pyrophosphate were added into the assay aliquots. The aliquots were put on ice 
for 10 min and then centrifuged at 12,000g for 10 min. Acid-soluble radioactivity in 200 ul of the supernatant was 

25 quantitated by liquid scintillation counting. One unit of exonuclease activity was defined as the amount of enzyme that 
catalyzed the acid solubilization of 10 nmol of total nucleotide in 30 min at 60 °C. 

Utility in DNA Sequencing 

30 Example 1: DNA Sequencing with FY Polymerases (e.g., FY2 and FY3) 

The following components were added to a microcentrifuge vial (0.5 ml) : 0.4 pmol M1 3 DNA (e.g. , M1 3mp1 8, 1 .0 
u.g); 2 uJ Reaction Buffer ( 260 mM Tris-HCI, pH 9.5 65 mM MgCI 2 ); 2 uJ of labeling nucleotide mixture (1.5 u.M each 
of dGTP, dCTP and dTTP); 0.5 uJ (5 jiCi) of [a-^PJdATP (about 2000Ci/mmol); 1 uJ -40 primer (0.5 u.M; 0.5 pmol/uJ 

35 S'GTTTTCCCAGTCACGAC-S'); 2 \x\ of a mixture containing 4 U/uJ FY polymerase and 6.6 U/ml Thermoplasma aci- 
dophilum inorganic pyrophophatase (32 U/uJ polymerase and 53 U/ml pyrophosphatase in 20 mM Tris (pH8.5), 100 
mM KCI, 0.1 mM EDTA, 1 mM DTT, 0.5% NP-40, 0.5% TWEEN-20 and 50% glycerol, diluted 8 fold in dilution buffer 
(10 mM Tris-HCI pH8.0, 1 mM 2-mercaptoethanol, 0.5% TWEEN-20, 0.5% NP-40)); and water to a total volume of 
1 7.5 uJ. These components (the labeling reaction) were mixed and the vial was placed in a constant -temperature water 

40 bath at 45°C for 5 minutes. 

Four vials were labeled A, C, G, and T, and filled with 4 uJ of the corresponding termination mix: ddA termination 
mix (150 p.M each dATP, dCTP, dGTP, dTTP, 1.5 u.M ddATP); ddT termination mix (150 uJvl each dATP, dCTP, dGTP, 
dTTP, 1 .5 |iM ddTTP); ddC termination mix (150 u.M each dATP, dCTP, dGTP, dTTP, 1 .5 uJvl ddCTP); ddG termination 
mix (150 u,M each dATP, dCTP, dGTP, dTTP, 1 .5 u.M ddGTP). 

45 The labeling reaction was divided equally among the four termination vials (4 jil to each termination reaction vial), 

and tightly capped. 

The four vials were placed in a constant-temperature water bath at 72°C for 5 minutes. Then 4 uJ of Stop Solution 
(95% Formamide 20 mM EDTA, 0.05% Bromophenol Blue, 0.05% Xylene Cyanol FF) added to each viat, and heated 
briefly to 70°-80°C immediately prior to loading on a sequencing gel (8% acrylamide, 8.3 M urea). Autoradiograms 
50 required an 18-36 hour exposure using Kodak XAR-5 film or Amersham Hyperfilm MP High-quality sequence results 
with uniform band intensities were obtained. The band intensities were much more uniform than those obtained with 
similar protocols using Taq DNA polymerase or ATaq DNA polymerase. 

Example 2: DNA Cycle Seguencing with FY Polymerases 

55 

The following components were added to a microcentrifuge vial (0.5 ml) which which is suitable for insertion into 
a thermocycler machine (afl., Perkin-Elmer DNA Thermal Cycler): 0.05 pmol or more M13 DNA (e^., M13mp18, 0.1 
u,g) , or 0.1 u.g double-stranded plasmid DNA (e^g., pUC19); 2 uJ Reaction Buffer ( 260 mM Tris-HCI, pH 9.5 65 mM 
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MgCI 2 ); 1 nl 3.0 |iM dGTP; 1 uJ 3.0 u.M dTTP; 0.5 u.l (5 \iO\) of [a-^PJdATP (about 2000Ci/mmoi); 1 uJ -40 primer (0.5 
u.M; 0.5 pmol/uJ 5'G Mil CCCAGTCACG AC-3') ; 2 \x\ of a mbrture containing 4 U/uJ FY polymerase and 6.6 U/ml 
Thermoplasma ac'tdophilum inorganic pyrophophatase (32 U/uJ polymerase and 53 U/ml pyrophosphatase in 20 mM 
Tris (pH8.5), 100 mM KCI. 0.1 mM EDTA, 1 mM DTT, 0.5% NP-40, 0.5% TWEEN-20 and 50% glycerol, diluted 8 fold 
5 in dilution buffer (10 mM Tris-HCI pH8.0, 1 mM 2-mercaptoethanol, 0.5% TWEEN-20, 0.5% NP-40)); and water to a 
total volume of 17.5 u.l. 

These components (labeling reaction mixture) were mixed and overlaid with 10 uJ light mineral oil (Amersham). 
The vial was placed in the thermocycler and 30-100 cycles (more than 60 cycles is unnecessary) from 45°C for 1 
minute to 95°C for 0.5 minute performed. (Temperatures can be cycled from 55°-95°C, if desired). The temperatures 
io may be adjusted if the melting temperature of the primerAemplate is significantly higher or lower, but these temperatures 
work well for most primer-templates combinations. This step can be completed in about 3 minutes per cycle. 

Four vials were labeled A, C, G, and T, and filled with 4 ml of the corresponding termination mix: ddA termination 
mix (150 u.M each dATP, dCTR dGTP, dTTP, 1.5 |iM ddATP); ddT termination mix (150 u.M each dATP, dCTP, dGTP, 
dTTP, 1.5 u.M ddTTP); ddC termination mix (1 50 u,M each dATP, dCTP, dGTP, dTTP, 1 .5 uJvl ddCTP); ddG termination 
T5 mix (1 50 |iM each dATP, dCTP, dGTP, dTTP, 1 .5 u.M ddGTP). No additional enzyme is added to the termination vials. 
The enzyme carried in from the prior (labeling) step is sufficient. 

The cycled labeling reaction mixture was divided equally among the four termination vials (4 uJ to each termination 
reaction vial), and overlaid with 10 uJ of light mineral oil. 

The four vials were placed in the thermocycler and 30-200 cycles (more than 60 cycles is unnecessary) performed 
20 from 95°C for 15 seconds, 55°C for 30 seconds, and 72°C for 120 seconds. This step was conveniently completed 
overnight. Other times and temperatures are also effective. 

Six uJ of reaction mixture was removed (avoiding oil), 3uJ of Stop Solution (95% Formamide 20 mM EDTA, 0.05% 
Bromophenol Blue, 0.05% Xylene Cyanol FF) added, and heated briefly to 70°-80°C immediately prior to loading on 
a sequencing gel. Autoradiograms required an 18-36 hour exposure using Kodak XAR-5 film or Amersham Hyperfilm 
25 MP High-quality sequence results with uniform band intensities were obtained. The band intensities were much more 
uniform than those obtained with similar protocols using Taq DNA polymerase or ATaq DNA polymerase. 

Example 3: Sequencing with dGTP analogs to eliminate compression artifacts. 

30 For either of the sequencing methods outlined in examples 1 and 2, 7-Deaza-2'deoxy-GTP can be substituted for 

dGTP in the labeling and termination mixtures at exactly the same concentration as dGTP. When this substitution is 
made, secondary structures on the gels are greatly reduced. Similarly, 2'-deoxyinosinetriphosphate can also be sub- 
stituted for dGTP but its concentration must be 10-fold higher than the corresponding concentration of dGTP. Substi- 
tution of dITP for dGTP is even more effective in eliminating compression artifacts than 7-deaza-dGTP. 

35 

Example 4; Other Seguencing methods using FY polymerases 

FY polymerases have been adapted for use with many other sequencing methods, including the use of fluorescent 
primers and fluorescent-d id eoxy -terminators for sequencing with the ABI 373A DNA sequencing instrument. 

40 

Example 5: SDS-Polyacrylamide Gel Electrophoresis 

Protein samples were run on a 14 X 16 mm 7.5 or 10% polyacrylamide gel. (Gels were predominantly 10% Poly- 
acrylamide using a 14 X 16 mm Hoefer apparatus. Other sizes, apparatuses, and percentage gels are acceptable. 

*5 Similar results can also be obtained using the Pharmacia Phast Gel system with SDS, 8-25% gradient gels. Reagent 
grade and ultrapure grade reagents were used.) The stacking gel consisted of 4% acrylamide (30:0.8, acrylamide: 
bisacrylamide), 125 mM Tris-HCI pH 6.8, 0.1% Sodium Dodecyl Sulfate (SDS) . The resolving gel consisted of 7.5 or 
10% acrylamide (30:0.8, acrylamide: bisacrylamide), 375 mM Tris-HCI pH 8.8, 0.1% SDS. Running Buffer consisted 
of 25 mM Tris, 192 mM Glycine and 0.1% SDS. 1X Sample Buffer consisted of 25 mM Tris-HCI pH 6.8, 0.25% SDS, 

so 10% Glycerol, 0.1M Dithiothreitol, 0.1% Bromophenol Blue, and 1mM EDTA. A 1/4 volume of 5X Sample Buffer was 
added to each sample. Samples were heated in sample buffer to 90-1 00°C for approximately 5 minutes prior to loading. 
A 1.5 mm thick gel was run at 50-100 mA constant current for 1-3 hours (until bromophenol.blue was close to the 
bottom of the gel). The gel was stained with 0.025% Coomassie Blue R250 in 50% methanol, 10% acetic acid and 
destained in 5% methanol, 7% acetic acid solution. A record of the gel was made by taking a photograph of the gel, 

55 by drying the gel between cellulose film sheets, or by drying the gel onto filter paper under a vacuum. 
Other embodiments are within the following claims. 
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SEQUENCE LISTING 



GENERAL INFORMATION: 



(i) APPLICANT: 



AMERSHAM LIFE SCIENCE 



(ii) TITLE OF INVENTION: 



THERMOSTABLE DNA 
POLYMERASES 



(iii) NUMBER OF SEQUENCES: 



14 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: 

(B) STREET: 

(C) CITY: 

(D) STATE: 

(E) COUNTRY: 

(F) ZIP: 



Lyon & Lyon 

633 West Fifth Street 

Suite 4700 

Los Angeles 

California 

U.S.A. 

90071-2066 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 

(B) COMPUTER: 

(C) OPERATING SYSTEM: 

(D) SOFTWARE: 



3.5" Diskette, 1.44 Mb 
storage 

IBM Compatible 
IBM P.C. DOS 5.0 
Word Perfect 5.1 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: To Be Assigned 

(B) FILING DATE: 

( C ) CLASS I F I CATION : 

(vii) PRIOR APPLICATION DATA: 
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Prior applications total, 
including application 
described below: one 

(A) APPLICATION NUMBER: US 08/455,686 

(B) FILING DATE: May 31, 1995 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Warburg, Richard J 

(B) REGISTRATION NUMBER: 32,327 

(C) REFERENCE/DOCKET NUMBER: 219/304 -PCT 



(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (213) 489-1600 

(B) TELEFAX: (213) 955-0440 

(C) TELEX: 67-3510 



(2) INFORMATION FOR SEQ ID NO: 1: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1686 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: FY2 

(B) LOCATION: 1. . .1683 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

ATG CTG GAG AGG CTT GAG TTT GGC AGC CTC CTC CAC GAG TTC GGC CTT 48 
Met Leu Glu Arg Leu Glu Phe Gly Ser Leu Leu His Glu Phe Gly Leu 
15 10 15 

CTG GAA AGC CCC AAG GCC CTG GAG GAG GCC CCC TGG CCC CCG CCG GAA 96 
Leu Glu Ser Pro Lys Ala Leu Glu Glu Ala Pro Trp Pro Pro Pro Glu 
20 25 30 

GGG GCC TTC GTG GGC TTT GTG CTT TCC CGC AAG GAG CCC ATG TGG GCC 144 
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Gly Ala Phe Val Gly Phe Val Leu Ser Arg Lys Glu Pro Met Trp Ala 
35 40 45 

5 GAT CTT CTG GCC CTG GCC GCC GCC AGG GGG GGC CGG GTC CAC CGG GCC 192 

Asp Leu Leu Ala Leu Ala Ala Ala Arg Gly Gly Arg Val His Arg Ala 
50 55 60 

CCC GAG CCT TAT AAA GCC CTC AGG GAC CTG AAG GAG GCG CGG GGG CTT 240 
io p ro Gi u p ro Tyr Lys Ala Leu Arg Asp Leu Lys Glu Ala Arg Gly Leu 

65 70 75 80 

CTC GCC AAA GAC CTG AGC GTT CTG GCC CTG AGG GAA GGC CTT GGC CTC 288 
Leu Ala Lys Asp Leu Ser Val Leu Ala Leu Arg Glu Gly Leu Gly Leu 
15 85 90 95 

CCG CCC GGC GAC GAC CCC ATG CTC CTC GCC TAC CTC CTG GAC CCT TCC 3 36 
Pro Pro Gly Asp Asp Pro Met Leu Leu Ala Tyr Leu Leu Asp Pro Ser 
100 105 110 



20 



25 



30 



35 



AAC ACC ACC CCC GAG GGG GTG GCC CGG CGC TAC GGC GGG GAG TGG ACG 384 
Asn Thr Thr Pro Glu Gly Val Ala Arg Arg Tyr Gly Gly Glu Trp Thr 
115 120 125 

GAG GAG GCG GGG GAG CGG GCC GCC CTT TCC GAG AGG CTC TTC GCC AAC 432 
Glu Glu Ala Gly Glu Arg Ala Ala Leu Ser Glu Arg Leu Phe Ala Asn 
130 135 140 

CTG TGG GGG AGG CTT GAG GGG GAG GAG AGG CTC CTT TGG CTT TAC CGG 480 
Leu Trp Gly Arg Leu Glu Gly Glu Glu Arg Leu Leu Trp Leu Tyr Arg 
14 5 ISO 1S5 160 



GAG GTG GAG AGG CCC CTT TCC GCT GTC CTG GCC CAC ATG GAG GCC ACG 528 
Glu Val Glu Arg Pro Leu Ser Ala Val Leu Ala His Met Glu Ala Thr 
165 170 175 

40 GGG GTG CGC CTG GAC GTG GCC TAT CTC AGG GCC TTG TCC CTG GAG GTG 576 

Gly Val Arg Leu Asp Val Ala Tyr Leu Arg Ala Leu Ser Leu Glu Val 
180 185 190 

GCC GAG GAG ATC GCC CGC CTC GAG GCC GAG GTC TTC CGC CTG GCC GGC 624 
45 Ala Glu Glu He Ala Arg Leu Glu Ala Glu Val Phe Arg Leu Ala Gly 

195 200 205 

CAC CCC, TTC AAC CTC AAC TCC CGG GAC CAG CTG GAA AGG GTC CTC TTT 672 
His Pro Phe Asn Leu Asn Ser Arg Asp Gin Leu Glu Arg Val Leu Phe 
SO 21Q 215 220 

GAC GAG CTA GGG CTT CCC GCC ATC GGC AAG ACG GAG AAG ACC GGC AAG 720 

Asp Glu Leu Gly Leu Pro Ala He Gly Lys Thr Glu Lys Thr Gly Lys 

225 230 235 240 

55 
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CGC TCC ACC AGC GCC GCC GTC CTG GAG GCC CTC CGC GAG GCC CAC CCC 76 B 
Arg Ser Thr Ser Ala Ala Val Leu Glu Ala Leu Arg Glu Ala Hie Pro 
245 250 255 

5 

ATC GTG GAG AAG ATC CTG CAG TAC CGG GAG CTC ACC AAG CTG AAG AGC 816 
lie Val Glu Lys He Leu Gin Tyr Arg Glu Leu Thr Lys Leu Lys Ser 
260 265 270 

10 ACC TAC ATT GAC CCC TTG CCG GAC CTC ATC CAC CCC AGG ACG GGC CGC 864 
Thr Tyr He Asp Pro Leu Pro Asp Leu He His Pro Arg Thr Gly Arg 
275 280 285 

CTC CAC ACC CGC TTC AAC CAG ACG GCC ACG GCC ACG GGC AGG CTA AGT 912 
15 Leu His Thr Arg Phe Asn Gin Thr Ala Thr Ala Thr Gly Arg Leu Ser 
290 295 300 

AGC TCC GAT CCC AAC CTC CAG AAC ATC CCC GTC CGC ACC CCG CTT GGG 960 
Ser Ser Asp Pro Asn Leu Gin Asn He Pro Val Arg Thr Pro Leu Gly 
20 305 310 315 320 

CAG AGG ATC CGC CGG GCC TTC ATC GCC GAG GAG GGG TGG CTA TTG GTG 1008 
Gin Arg He Arg Arg Ala Phe He Ala Glu Glu Gly Trp Leu Leu Val 
325 330 335 



25 



30 



35 



40 



GCC CTG GAC TAT AGC CAG ATA GAG CTC AGG GTG CTG GCC CAC CTC TCC 1056 
Ala Leu Asp Tyr Ser Gin He Glu Leu Arg Val Leu Ala His Leu Ser 
340 345 350 

GGC GAC GAG AAC CTG ATC CGG GTC TTC CAG GAG GGG CGG GAC ATC CAC 1104 
Gly Asp Glu Asn Leu lie Arg Val Phe Gin Glu Gly Arg Asp lie His 
355 360 365 

ACG GAG ACC GCC AGC TGG ATG TTC GGC GTC CCC CGG GAG GCC GTG GAC 1152 
Thr Glu Thr Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val Asp 
370 375 380 

CCC CTG ATG CGC CGG GCG GCC AAG ACC ATC AAC TAC GGG GTC CTC TAC 1200 
Pro Leu Met Arg Arg Ala Ala Lys Thr He Asn Tyr Gly Val. Leu Tyr 
385 390 395 400 



GGC ATG TCG GCC CAC CGC CTC TCC CAG GAG CTA GCC ATC CCT TAC GAG 1248 

Gly Met Ser Ala His Arg Leu Ser Gin Glu Leu Ala He Pro Tyr Glu 

45 405 410 415 

GAG GCC CAG GCC TTC ATT GAG CGC TAC TTT CAG AGC TTC CCC AAG GTG 1296 

Glu Ala Gin Ala Phe He Glu Arg Tyr Phe Gin Ser Phe Pro Lys Val 

420 425 430 

50 



CGG GCC TGG ATT GAG AAG ACC CTG GAG GAG GGC AGG AGG CGG GGG TAC 1344 
Arg Ala Trp He Glu Lys Thr Leu Glu Glu Gly Arg Arg Arg Gly Tyr 
55 435 440 445 
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to 



15 



20 



25 



30 



GTG GAG ACC CTC TTC GGC CGC CGC CGC TAC GTO CCA OAC CTA GAG GCC 13 92 
Val Glu Thr Leu Phe Gly Arg Arg Arg Tyr Val Pro Asp Leu Glu Ala 
450 455 460 

CGG GTG AAG AGC GTG CGG GAG GCG GCC GAG CGC ATG GCC TTC AAC ATG 1440 
Arg Val Lya Ser Val Arg Glu Ala Ala Glu Arg Met Ala Phe Asn Met 
465 470 475 480 

CCC GTC CAG GGC ACC GCC GCC GAC CTC ATG AAG CTG GCT ATG GTG AAG 1488 
Pro Val Gin Gly Thr Ala Ala Asp Leu Met Lys Leu Ala Met Val Lys 
485 490 495 

CTC TTC CCC AGG CTG GAG GAA ATG GGG GCC AGG ATG CTC CTT CAG GTC 1536 
Leu Phe Pro Arg Leu Glu Glu Met Gly Ala Arg Met Leu Leu Gin Val 
500 505 510 

CAC GAC GAG CTG GTC CTC GAG GCC CCA AAA GAG AGG GCG GAG GCC GTG 1584 
His Asp Glu Leu Val Leu Glu Ala Pro Lys Glu Arg Ala Glu Ala Val 
515 520 525 

GCC CGG CTG GCC AAG GAG GTC ATG GAG GGG GTG TAT CCC CTG GCC GTG 1632 
Ala Arg Leu Ala Lys Glu Val Met Glu Gly Val Tyr Pro Leu Ala Val 
530 535 540 

CCC CTG GAG GTG GAG GTG GGG ATA GGG GAG GAC TGG CTC TCC GCC AAG 1680 
Pro Leu Glu Val Glu Val Gly He Gly Glu Asp Trp Leu Ser Ala Lys 
545 550 555 560 

GAG TGA 1686 
Glu * 



35 



40 



45 



(2) INFORMATION FOR SEQ ID NO : 2: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS : 

(D) TOPOLOGY: 



1689 base pairs 
nucleic acid 
single 
linear 



(ix) FEATURE: 



so 



55 



(A) NAME /KEY : FY3 

(B) LOCATION: 1. . .1686 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
ATG GCT CTG GAA CGT CTG GAG TTT GGC AGC CTC CTC CAC GAG TTC GGC 



48 
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10 



15 



20 



25 



30 



SO 



55 



Met Ala Leu Glu Arg Leu Glu Phe Gly Ser Leu Leu His Glu Phe Gly 
1 5 10 15 

CTT CTG GAA AGC CCC AAG GCC CTG GAG GAG GCC CCC TGG CCC CCG CCG 96 
Leu Leu Glu Ser Pro Lys Ala Leu Glu Glu Ala Pro Trp Pro Pro Pro 
20 25 30 

GAA GGG GCC TTC GTG GGC TTT GTG CTT TCC CGC AAG GAG CCC ATG TGG 144 
Glu Gly Ala Phe Val Gly Phe Val Leu Ser Arg Lys Glu Pro Met Trp 
35 40 45 



GCC GAT CTT CTG GCC CTG GCC GCC GCC AGG GGG GGC CGG GTC CAC CGG 192 
Ala Asp Leu Leu Ala Leu Ala Ala Ala Arg Gly Gly Arg Val His Arg 
50 55 60 

GCC CCC GAG CCT TAT AAA GCC CTC AGG GAC CTG AAG GAG GCG CGG GGG 24 0 
Ala Pro Glu Pro Tyr Lys Ala Leu Arg Asp Leu Lys Glu Ala Arg Gly 
65 70 75 80 

CTT CTC GCC AAA GAC CTG AGC GTT CTG GCC CTG AGG GAA GGC CTT GGC 288 
Leu Leu Ala Lys Asp Leu Ser Val Leu Ala Leu Arg Glu Gly Leu Gly 
85 90 95 

CTC CCG CCC GGC GAC GAC CCC ATG CTC CTC GCC TAC CTC CTG GAC CCT 336 
Leu Pro Pro Gly Asp Asp Pro Met Leu Leu Ala Tyr Leu Leu Asp Pro 
100 105 110 

TCC AAC ACC ACC CCC GAG GGG GTG GCC CGG CGC TAC GGC GGG GAG TGG 384 

Ser A8n Thr Thr Pro Glu Gly Val Ala Arg Arg Tyr Gly Gly Glu Trp 

115 120 125 

35 ACG GAG GAG GCG GGG GAG CGG GCC GCC CTT TCC GAG AGG CTC TTC GCC 432 

Thr Glu Glu Ala Gly Glu Arg Ala Ala Leu Ser Glu Arg Leu Phe Ala 
130 135 140 

AAC CTG TGG GGG AGG CTT GAG GGG GAG GAG AGG CTC CTT TGG CTT TAC 480 
40 Asn Leu Trp Gly Arg Leu Glu Gly Glu Glu Arg Leu Leu Trp Leu Tyr 

145 150 155 160 

CGG GAG GTG GAG AGG CCC CTT TCC GCT GTC CTG GCC CAC ATG GAG GCC 528 
Arg Glu Val Glu Arg Pro Leu Ser Ala Val Leu Ala His Met Glu Ala 
45 165 170 175 

ACG GGG GTG CGC CTG GAC GTG GCC TAT CTC AGG GCC TTG TCC CTG GAG 576 
Thr Gly Val Arg Leu Asp Val Ala Tyr Leu Arg Ala Leu Ser Leu Glu 
180 185 190 



GTG GCC GAG GAG ATC GCC CGC CTC GAG GCC GAG GTC TTC CGC CTG GCC 624 
Val Ala Glu Glu lie Ala Arg Leu Glu Ala Glu Val Phe Arg Leu Ala 
195 200 205 

GGC CAC CCC TTC AAC CTC AAC TCC CGG GAC CAG CTG GAA AGG GTC CTC 672 
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Gly His Pro Phe Asn Leu Asn Ser Arg Asp Gin Leu Glu Arg Val Leu 
210 215 220 

TTT GAC GAG CTA GGG CTT CCC GCC ATC GGC AAG ACG GAG AAG ACC GGC 720 
Phe Asp Glu Leu Gly Leu Pro Ala He Gly Lys Thr Glu Lys Thr Gly 
225 230 235 240 

AAG CGC TCC ACC AGC GCC GCC GTC CTG GAG GCC CTC CGC GAG GCC CAC 768 
Lys Arg Ser Thr Ser Ala Ala Val Leu Glu Ala Leu Arg Glu Ala His 
245 250 255 

CCC ATC GTG GAG AAG ATC CTG CAG TAC CGG GAG CTC ACC AAG CTG AAG 816 
Pro He Val Glu Lys He Leu Gin Tyr Arg Glu Leu Thr Lys Leu Lys 
260 265 270 

AGC ACC TAC ATT GAC CCC TTG CCG GAC CTC ATC CAC CCC AGG ACG GGC 864 
Ser Thr Tyr He Asp Pro Leu Pro Asp Leu He His Pro Arg Thr Gly 
275 280 285 

CGC CTC CAC ACC CGC TTC AAC CAG ACG GCC ACG GCC ACG GGC AGG CTA 912 
Arg Leu His Thr Arg Phe Asn Gin Thr Ala Thr Ala Thr Gly Arg Leu 
290 295 300 

25 

AGT AGC TCC GAT CCC AAC CTC CAG AAC ATC CCC GTC CGC ACC CCG CTT 960 
Ser Ser Ser Asp Pro Asn Leu Gin Asn He Pro Val Arg Thr Pro Leu 
305 310 315 320 



TO 



15 



20 



30 

GGG CAG AGG ATC CGC CGG GCC TTC ATC GCC GAG GAG GGG TGG CTA TTG 1008 

" Gly Gin Arg He Arg Arg Ala Phe He Ala Glu Glu Gly Trp Leu Leu 

325 330 335 

35 

GTG GCC CTG GAC TAT AGC CAG ATA GAG CTC AGG GTG CTG GCC CAC CTC 1056 
Val Ala Leu Asp Tyr Ser Gin He Glu Leu Arg Val Leu Ala His Leu 
340 345 350 

40 TCC GGC GAC GAG AAC CTG ATC CGG GTC TTC CAG GAG GGG CGG GAC ATC 1104 

Ser Gly Asp Glu Asn Leu lie Arg Val Phe Gin Glu Gly Arg Asp He 
355 360 365 

CAC ACG GAG ACC GCC AGC TGG ATG TTC GGC GTC CCC CGG GAG GCC GTG 1152 
4* His Thr Glu Thr Ala Ser Trp Met Phe Gly Val Pro Arg Glu Ala Val 

370 375 380 

GAC CCC CTG ATG CGC CGG GCG GCC AAG ACC ATC AAC TAC GGG GTC CTC 1200 
Asp Pro Leu Met Arg Arg Ala Ala Lys Thr He Asn Tyr Gly Val Leu 
50 385 390 395 400 

TAC GGC ATG TCG GCC CAC CGC CTC TCC CAG GAG CTA GCC ATC CCT TAC 124 8 

Tyr Gly Met Ser Ala His Arg Leu Ser Gin Glu Leu Ala He Pro Tyr 
405 410 415 

55 



15 



EP 0 745 676 A1 



10 



15 



20 



GAG GAG GCC CAG GCC TTC ATT GAG CGC TAC TTT CAG AGC TTC CCC AAG 1296 
Glu Glu Ala Gin Ala Phe He Glu Arg Tyr Phe Gin Ser Phe Pro Lys 
420 425 430 

GTG CGG GCC TGG ATT GAG AAG ACC CTG GAG GAG GGC AGG AGG CGG GGG 1344 
Val Arg Ala Trp He Glu Lys Thr Leu Glu Glu Gly Arg Arg Arg Gly 
435 440 445 

TAC GTG GAG ACC CTC TTC GGC CGC CGC CGC TAC GTG CCA GAC CTA GAG 13 92 
Tyr Val Glu Thr Leu Phe Gly Arg Arg Arg Tyr Val Pro Asp Leu Glu 
450 455 460 

GCC CGG GTG AAG AGC GTG CGG GAG GCG GCC GAG CGC ATG GCC TTC AAC 1440 
Ala Arg Val Lys Ser Val Arg Glu Ala Ala Glu Arg Met Ala Phe Asn 
465 470 475 480 

ATG CCC GTC CAG GGC ACC GCC GCC GAC CTC ATG AAG CTG GCT ATG GTG 1488 
Met Pro Val Gin Gly Thr Ala Ala Asp Leu Met Lys Leu Ala Met Val 
485 490 495 



AAG CTC TTC CCC AGG CTG GAG GAA ATG GGG GCC AGG ATG CTC CTT CAG 1536 
Lys Leu Phe Pro Arg Leu Glu Glu Met Gly Ala Arg Met Leu Leu Gin 
25 500 505 510 

GTC CAC GAC GAG CTG GTC CTC GAG GCC CCA AAA GAG AGG GCG GAG GCC 1584 

Val His Asp Glu Leu Val Leu Glu Ala Pro Lys Glu Arg Ala Glu Ala 
515 520 525 

30 

GTG GCC CGG CTG GCC AAG GAG GTC ATG GAG GGG GTG TAT CCC CTG GCC 1632 

Val Ala Arg Leu Ala Lys Glu Val Met Glu Gly Val Tyr Pro Leu Ala 
530 535 540 

55 GTG CCC CTG GAG GTG GAG GTG GGG ATA GGG GAG GAC TGG CTC TCC GCC 1680 

Val Pro Leu Glu Val Glu Val Gly He Gly Glu Asp Trp Leu Ser Ala 
545 550 555 560 

AAG GAG TGA 1689 
40 Lys Glu * 



45 



(2) INFORMATION FOR SEQ ID NO: 



3: 



(i) SEQUENCE CHARACTERISTICS 



so 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS 

(D) TOPOLOGY: 



23 base pairs 
nucleic acid 
single 
linear 



55 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
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GCTTGGGCAG AGGATCCGCC GGG 



(2) INFORMATION FOR SEQ ID NO: 4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GGGATGGCTA GCTCCTGGGA GAGGCGGTGG GCCGACATGC CGTAGAGGAC 
CCCGTAGTTG ATGG 

(2) INFORMATION FOR SEQ ID NO: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GGAATTCCAT ATGGACGATC TGAAGCTCTC C 



(2) INFORMATION FOR SEQ ID NO: 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
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GGGGTACCAA GCTTCACTCC TTGGCGGAGA G 



(2) INFORMATION FOR SEQ ID NO: 7: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



GGAATTCCAT ATG CTGG AG A GGCTTGAGTT T 

(2) INFORMATION FOR SEQ ID NO: 8: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 4 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGAATTCCAT ATGCTGGAAC GTCTGGAGTT TGGCAGCCTC CTC 



(2) INFORMATION FOR SEQ ID NO: 9: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 
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GGAATTCCAT ATGGCTCTGG AACGTCTGGA GTTTGGCAGC CTCCTC 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGAATTCCAT ATGCTGGAAC GTCTGGAATT CGGCAGCCTC 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GGGGTACCCT AACCCTTGGC GGAAAG CC AG TC 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GGGATGGCTA GCTCCTGGGA GAGCCTATGG GCGGACATGC CGTAGAGGAC 
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GCCGTAGTTC ACCG 

(2) INFORMATION FOR SEQ ID NO: 13 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS : 

(D) TOPOLOGY: 



35 base pairs 
nucleic acid 
single 
linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
CTAGCTAGCC ATCCCCTACG AAGAAGCGGT GGCCT 



(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

( C ) STRANDEDNESS ; 

(D) TOPOLOGY: 



1686 base pairs 
nucleic acid 
single 
linear 



<ix) FEATURE: 



(A) NAME/KEY: FY4 

(B) LOCATION: 1...1683 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

ATG CTG GAA CGT CTG GAA TTC GGC AGC CTC CTC CAC GAG TTC GGC CTC 
Met Leu Glu Arg Leu Glu Phe Gly Ser Leu Leu His Glu Phe Gly Leu 
1 5 10 15 

CTG GAG GCC CCC GCC CCC CTG GAG GAG GCC CCC TGG CCC CCG CCG GAA 
Leu Glu Ala Pro Ala Pro Leu Glu Glu Ala Pro Trp Pro Pro Pro Glu 
20 25 30 

GGG GCC TTC GTG GGC TTC GTC CTC TCC CGC CCC GAG CCC ATG TGG GCG 
Gly Ala Phe Val Gly Phe Val Leu Ser Arg Pro Glu Pro Met Trp Ala 
35 40 45 
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GAG CTT AAA GCC CTG GCC GCC TGC AGG GAC GGC CGG GTG CAC CGG GCA 192 
Glu Leu Lys Ala Leu Ala Ala Cys Arg Asp Gly Arg Val His Arg Ala 
50 55 60 

5 

GCA GAC CCC TTG GCG GGG CTA AAG GAC CTC AAG GAG GTC CGG GGC CTC 240 
Ala Asp Pro Leu Ala Gly Leu Lys Asp Leu Lys Glu Val Arg Gly Leu 
65 70 75 80 

10 CTC GCC AAG GAC CTC GCC GTC TTG GCC TCG AGG GAG GGG CTA GAC CTC 288 

Leu Ala Lys Asp Leu Ala Val Leu Ala Ser Arg Glu Gly Leu Asp Leu 
85 90 95 



GTG CCC GGG GAC GAC CCC ATG CTC CTC GCC TAC CTC CTG GAC CCC TCC 336 
Val Pro Gly Asp Asp Pro Met Leu Leu Ala Tyr Leu Leu Asp Pro Ser 
100 105 110 

AAC ACC ACC CCC GAG GGG GTG GCG CGG CGC TAC GGG GGG GAG TGG ACG 384 
Asn Thr Thr Pro Glu Gly Val Ala Arg Arg Tyr Gly Gly Glu Trp Thr 
115 120 125 

GAG GAC GCC GCC CAC CGG GCC CTC CTC TCG GAG AGG CTC CAT CGG AAC 432 
Glu Asp Ala Ala His Arg Ala Leu Leu Ser Glu Arg Leu His Arg Asn 
130 135 140 

CTC CTT AAG CGC CTC GAG GGG GAG GAG AAG CTC CTT TGG CTC TAC CAC 48 0 

Leu Leu Lys. Arg Leu Glu Gly Glu Glu Lys Leu Leu Trp Leu Tyr His 
145 150 155 160 

GAG GTG GAA AAG CCC CTC TCC CGG GTC CTG GCC CAC ATG GAG GCC ACC 528 
Glu Val Glu Lys Pro Leu Ser Arg Val Leu Ala His Met Glu Ala Thr 
165 170 175 

GGG GTA CGG CTG GAC GTG GCC TAC CTT CAG GCC CTT TCC CTG GAG CTT 576 
Gly Val Arg Leu Asp Val Ala Tyr Leu Gin Ala Leu Ser Leu Glu Leu 
180 185 190 

GCG GAG GAG ATC CGC CGC CTC GAG GAG GAG GTC TTC CGC TTG GCG GGC 624 
Ala Glu Glu He Arg Arg Leu Glu Glu Glu Val Phe Arg Leu Ala Gly 
195 200 205 

CAC CCC TTC AAC CTC AAC TCC CGG GAC CAG CTG GAA AGG GTG CTC TTT 672 
His Pro Phe Asn Leu Asn Ser Arg Asp Gin Leu Glu Arg Val Leu Phe 
210 215 220 

GAC GAG CTT AGG CTT CCC GCC TTG GGG AAG ACG CAA AAG AC A GGC AAG 720 
Asp Glu Leu Arg Leu Pro Ala Leu Gly Lys Thr Gin Lys Thr Gly Lys 
225 230 235 240 

CGC TCC ACC AGC GCC GCG GTG CTG GAG GCC CTA CGG GAG GCC CAC CCC 768 
Arg Ser Thr Ser Ala Ala Val Leu Glu Ala Leu Arg Glu Ala His Pro 
245 250 255 

55 ATC GTG GAG AAG ATC CTC CAG CAC CGG GAG CTC ACC AAG CTC AAG AAC 816 



75 



20 



25 



30 



35 



40 



45 



50 
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lie Val Glu Lys lie Leu Gin His Arg Glu Leu Thr Lys Leu Lys Asn 
260 265 270 

5 ACC TAC GTG GAC CCC CTC CCA AGC CTC GTC CAC CCG AGG ACG GGC CGC 864 
Thr Tyr Val Asp Pro Leu Pro Ser Leu Val His Pro Arg Thr Gly Arg 
275 280 285 

CTC CAC ACC CGC TTC AAC CAG ACG GCC ACG GCC ACG GGG AGG CTT AGT 912 
io Leu His Thr Arg Phe Asn Gin Thr Ala Thr Ala Thr Gly Arg Leu Ser 
290 295 300 

AGC TCC GAC CCC AAC CTG CAG AAC ATC CCC GTC CGC ACC CCC TTG GGC 960 
Ser Ser Asp Pro Asn Leu Gin Asn lie Pro Val Arg Thr Pro Leu Gly 
15 305 310 315 320 

CAG AGG ATC CGC CGG GCC TTC GTG GCC GAG GCG GGT TGG GCG TTG GTG 1008 

Gin Arg lie Arg Arg Ala Phe Val Ala Glu Ala Gly Trp Ala Leu Val 

325 330 335 

20 

GCC CTG GAC TAT AGC CAG ATA GAG CTC CGC GTC CTC GCC CAC CTC TCC 1056 

Ala Leu Asp Tyr Ser Gin lie Glu Leu Arg Val Leu Ala His Leu Ser 
340 345 350 

GGG GAC GAA AAC CTG ATC AGG GTC TTC CAG GAG GGG AAG GAC ATC CAC 1104 
Gly Asp Glu Asn Leu lie Arg Val Phe Gin Glu Gly Lys Asp lie His 
355 360 365 

ACC CAG ACC GCA AGC TGG ATG TTC GGC GTC CCC CCG GAG GCC GTG GAC 1152 
Thr Gin Thr Ala Ser Trp Met Phe Gly Val Pro Pro Glu Ala Val Asp 
370 375 380 

CCC CTG ATG CGC CGG GCG GCC AAG ACG GTG AAC TAC GGC GTC CTC TAC 1200 
Pro Leu Met Arg Arg Ala Ala Lys Thr Val Asn Tyr Gly Val Leu Tyr 
385 390 395 400 

GGC ATG TCC GCC CAT AGG CTC TCC CAG GAG CTA GCC ATC CCC TAC GAA 124 8 
Gly Met Ser Ala His Arg Leu Ser Gin Glu Leu Ala lie Pro Tyr Glu 
405 410 415 

GAA GCG GTG GCC TTT ATA GAG CGC TAC TTC CAA AGC TTC CCC AAG GTG 1296 
Glu Ala Val Ala Phe He Glu Arg Tyr Phe Gin Ser Phe Pro Lys Val 
420 425 430 

CGG GCC TGG ATA GAA AAG ACC CTG GAG GAG GGG AGG AAG CGG GGC TAC 1344 
Arg Ala Trp He Glu Lys Thr Leu Glu Glu Gly Arg Lys Arg Gly Tyr 
435 440 445 

50 GTG GAA ACC CTC TTC GGA AGA AGG CGC TAC GTG CCC GAC CTC AAC GCC 13 92 

Val Glu Thr Leu Phe Gly Arg Arg Arg Tyr Val Pro Asp Leu Asn Ala 
450 455 460 



25 



30 



35 



40 



45 



55 



CGG GTG AAG AGC GTC AGG GAG GCC GCG GAG CGC ATG GCC TTC AAC ATG 1440 
Arg Val Lys Ser Val Arg Glu Ala Ala Glu Arg Met Ala Phe Asn Met 
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465 470 475 480 

CCC GTC CAG GGC ACC GCC GCC GAC CTC ATG AAG CTC GCC ATG GTG AAG 1488 
5 Pro Val Gin Gly Thr Ala Ala Asp Leu Met Lys Leu Ala Met Val Lys 

485 490 495 

CTC TTC CCC CGC CTC CGG GAG ATG GGG GCC CGC ATG CTC CTC CAG GTC 1536 
Leu Phe Pro Arg Leu Arg Glu Met Gly Ala Arg Met Leu Leu Gin Val 
10 500 505 510 

CAC GAC GAG CTC CTC CTG GAG GCC CCC CAA GCG CGG GCC GAG GAG GTG 1584 
His Asp Glu Leu Leu Leu Glu Ala Pro Gin Ala Arg Ala Glu Glu Val 
515 520 525 

GCG GCT TTG GCC AAG GAG GCC ATG GAG AAG GCC TAT CCC CTC GCC GTG 1632 

Ala Ala Leu Ala Lys Glu Ala Met Glu Lys Ala Tyr Pro Leu Ala Val 
530 S35 540 

20 

CCC CTG GAG GTG GAG GTG GGG ATG GGG GAG GAC TGG CTT TCC GCC AAG 1680 

Pro Leu Glu Val Glu Val Gly Met Gly Glu Asp Trp Leu Ser Ala Lys 
545 550 555 560 



25 
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GGT TAG 1686 
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Claims 



35 1 . An enzymatically active DN A polymerase having between 540 and 582 amino acids having a tyrosine at a position 
equivalent to position 667 of Taq DNA polymerase, wherein said polymerase lacks 5" to 3* exonuclease activity, 
and wherein said polymerase has at least 95% homology in its amino acid sequence to the DNA polymerase of 
Thermus aquaticus , Therm us flavus or Therm us thermophilus . and wherein said polymerase forms a single 
polypeptide band or an SDS polyacrylamide gel. 

40 

2. The polymerase of claim 1 wherein the amino acid sequence of said polymerase includes less than 3 conservative 
amino acid changes compared to one said DNA polymerase of said named Thermus species. 

3. The polymerase of claim 1 wherein the amino acid sequence of said polymerase includes less than 3 additional 
45 amino acids compared to one said DNA polymerase of said named Thermus species at its N -terminus. 

4. The polymerase of claim 1 selected from the group consisting of FY2, FY3 and FY4. 

5. Purified nucleic acid encoding the DNA polymerase of any of claims 1 -4. 

so 

6. Method for sequencing DNA comprising the step of generating chain terminated fragments from the DNA template 
to be sequenced with a DNA polymerase of any of claims 1-4 in the presence of at least one chain terminating 
agent and one or more nucleotide triphosphates, and determining the sequence of said DNA from the sizes of said 
fragments. 

55 

7. Kit for sequencing DNA comprising a DNA polymerase of any of claims 1 -4 and a pyrophosphatase. 

8. The kit of claim 7 wherein said pyrophosphatase is thermostable. 
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9. Apparatus for DNA sequencing having a reactor comprising a DN A polymerase of any of claims 1 -4 and a band 
separator. 
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FIG. 1 
(sheet 1) 



*DHA sequence 1666 b.p. atgctggagagg ... gccaaggagcga linear 

> 

1/1 31/11 

atg ctg gag agg ctt gag .ttt ggc age ccc etc cac gag ttc ggc ctt ctg gaa age ccc 
MLERLEPGSLtLKEFCLLES P 
61/21 91/31 

aag gee ctg gag gag gec ccc tgg ccc ccg ccg gaa ggg gee ccc gcg ggc CCC gcg ccc 
KALEEAPWPPPEGAFVGFVL 
121/41 1S1/S1 

ccc cgc aag gag ccc atg egg gec gac ccc ctg gee ctg gee gee gec agg ggg ggc egg 
SRKEPHWADLLALAAARGCR 
181/61 211/71 

gec cac egg gec ccc gag ccc cac aaa gec etc agg gac ccg aag gag gcg egg ggg ccc 
VH RAPBPYKALRDLKEA RG L 
241/81 271/91 

etc gee aaa gac ctg age gtt ctg gee ctg agg gaa ggc ctt ggc etc ccg ccc ggc gac 
LAKDLSVLALRECLGLPPGD 
301/101 331/111 

gac ccc atg etc etc gee tac ccc ccg gac ccc tec aac ace acc ccc gag ggg gcg gec 
DPMLLAYLLDPSNTTPEGVA 
361/121 391/131 

egg cgc cac ggc ggg gag tgg acg gag gag gcg ggg gag egg gec gec ccc tec gag agg 
RRYCGEWTEEACERAALSER 
421/141 451/151 

etc ttc gee aac ccg egg ggg agg ccc. gag ggg gag gag agg ccc ccc tgg ccc cac egg 
LFAHLWGRLEGEERLLWLYR 
481/161 511/171 

gag gtg gag agg ccc etc ccc get gee ccg gee cac acg gag gec acg ggg gtg cgc ccg 
EVERPLSAVLAHMEATGVRL 
541/181 571/191 

gac gtg gec cac etc agg gec ttg tec ctg gag gcg gee gag gag acc gee cgc ccc gag • 
DVAYLRALSL EVAEEIARLE 
601/201 631/211 

gee gag gee ttc cgc ctg gee ggc cac ccc ccc aac ccc aac tec egg gac cag ccg gaa 
AEVFRLAGH P PN L NS ROQL E 
661/221 691/231 

agg gee ccc ccc gac gag eta ggg cCC ccc gec acc ggc aag acg gag aag acc ggc aag 
RVLFDELGLPAICKTEKTGK 
721/241 7S1/251 

cgc tec acc age gec gee gtc ctg gag gec ccc cgc gag gee cac ccc acc gcg gag aag 
RSTSAAVLEALREAHP.IVEK 
781/261 811/271 

ate ctg cag tac egg gag etc acc aag ctg aag age acc tac act gac ccc ttg ccg gac 
ILOYRELTKLKSTYIDPLPD 
641/281 871/291 

etc acc cac ccc agg acg ggc cgc ccc cac acc cgc etc aac cag acg gee acg gee acg 
LI HPRTGRL^HTR FNQTATAT 
901/301 931/311 

ggc agg eta agt age tec gat ccc aac ccc cag aac ate ccc gec cgc acc ccg ccc ggg 
GRLSSSDPNLQN I PVRTPLG 
961/321 991/331 

cag agg acc cgc egg gec ccc acc gec gag gag ggg egg cca ttg gtg gec ctg gac cat 
QRIRRAFIAEECWLLVALDY 
1021/341 1051/351 

age cag aca gag etc agg gtg ccg gec cac etc Ccc ggc gac gag aac ccg acc egg gtc 
SQIELRVLAHL SGDE NLIRV 
1081/361 1111/371 

ttc cag gag ggg egg gac ate cac acg gag acc gee age tgg atg ttc ggc gtc ccc egg 
FQEGROIHTETASWMFGVPR 
1141/381 1171/391 

gag gee gtg gac ccc ctg atg cgc egg gcg gee aag acc ate aac cac ggg gec ccc cac 
EAVDPLMRRAAKTINYGV LY 
1201/401 1231/411 

ggc atg teg gec cac cgc etc tec cag gag eta gec acc ccc cac gag gag gec cag qcc 

cms a h n f #- « r- • • - • 
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1261/421 

tec att gag cgc tac ttt cag age ttc ccc 

FIERYFQSFP 

y21/441 

gag gag ggc agg agg egg ggg tac gcg gag 

EftEGRRRGYVE 

1381/461 

gac eta gag gee egg gtg aag age gtg egg 
DLEA RV KSVR 
1441/481 

ccc gtc cag ggc acc gec gee gac etc atg 

pVQGTAADLM 

1501/501 

ctg gag gaa atg ggg gec agg atg etc ctt 

LEEMGARMLL 

1561/S21 

cca aaa gag agg gcg gag gec gtg gee egg 
PKERAEAV A R 
1621/541 

ccc ctg gec gtg ccc ctg gag gtg gag gtg 

PLAVPLEVEV 

1681/561 

gag tga 

E • 



1291/OX 

aag gtg egg gee tgg att gag aag acc ctg 

KVRAWIEKTL 

1351/451 

acc etc ttc ggc cgc cgc cgc tac gtg cca 
TLFGRRRYV P 
1411/471 

gag gcg gee gag cgc atg gee ttc aac atg 

EAAERMAFNM 

1471/491 

aag ctg get atg gtg aag etc ttc ccc agg 

KLAMVKLFPR 

1531/511 

cag gtc cac gac gag ctg gtc etc gag gee 

QVHDELVLEA 

1591/531 

ctg gec aag gag gtc atg gag ggg gtg tat 

LAKEVMEGVY 

1651/551 

ggg ata ggg gag gac tgg etc tec gec aag 
CIGEDWLSAK 



FIG. 1 
(sheet 2) 
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FIG. 2 
(sheet 1 



*OMA sequence 1689 b.p. atggccccggaa ... gccaaggagtga linear 

r 

1/1 31/11 

a tg get ctg gaa cgt ctg gag ttt ggc age etc etc cac gag ttc ggc etc ctg gaa age 
WALERLEFGSLLIIEFGLLES 
61/21 91/31 

ccc aag gec ctg gag gag gec cce tgg ccc ccg ccg gaa ggg gec ttc gtg ggc ttt gcg 
PKALEEAPWP p PECAFVGFV 
121/41 151/S1 

ctt tec cgc aag gag ccc acg tgg gee gat ctt ctg gee ctg gee gec gee agg ggg ggc 
LSRKEPHWADLLALAAARCC 
181/61 211/71 

egg gtc cac egg gec ccc gag cct tat aaa gec etc agg gac ctg aag gag gcg egg ggg 
RV HRAPEPYKALRDLKEARG 
241/81 m 271/91 

ctt etc gec aaa gac ccg age gtt ctg gee ctg agg gaa ggc ctt ggc etc ccg ccc ggc 
LLAKDLSVLA LRECLGLP PC 
301/101 331/111 
• gac gac ccc atg etc etc gec tac etc ctg gac cct tec aac acc acc ccc gag ggg gtg 
DD PMLLA Y L L DP S M T T P EGV 
361/121 391/131 

gee egg cgc tac ggc ggg gag tgg acg -gag gag gcg ggg gag egg gee gee ctt tec gag 
ARRYGGEWTEEAGERAALSE 
421/141 451/151 

agg etc ttc gee aac ctg tgg ggg agg ctt gag ggg gag gag agg etc cct tgg ctt tac 
RLFANLWGRL EG EERLLWLY 
481/161 511/171 

egg gag gtg gag agg ccc ctt tec get gtc ctg gec cac atg gag gec acg ggg gtg cgc 
REVER PLSAV LAHM EATGVR 
541/181 571/191 

ctg gac gtg gec cat etc agg gec ttg tec ctg gag gtg gec gag gag ate gee cgc etc 
LDVAYLRALS LEVAEEIARL 
601/201 631/211 

gag gec gag gtc ttc cgc ctg gec ggc cac ccc ttc aac ccc aac tec egg gac cag ctg 
EAEVFRLAG H PFNLNSRDQL 
661/221 691/231 

gaa agg gtc etc ttt gac gag cca ggg cct ccc gec ate ggc aag acg gag aag acc ggc 
ERVLFDELGL PA IGKTEKTG 
721/241 751/2S1 

aag cgc tec acc age gee gee gec ctg gag gec etc cgc gag gee cac ccc acc gcg gag 
KRSTSAAVLEALREAH PIVE 
781/261 811/271 

aag acc ctg cag tac egg gag etc acc aag ccg aag age acc cac att gac ccc ttg ccg 
K I LQYR ELTK L K S T Y IDPLP 
841/281 871/291 

gac etc ate cac ccc agg acg ggc cgc etc cac acc cgc ccc aac cag acg gec acg gec 
DLIIIPRTGRLHTRFNQTATA 
901/301 931/311 

acg ggc agg eta agt age tec gat ccc aac etc cag aac ate ccc gee cgc acc ccg ctt 

tc rlsssdpnlqnipvrt.pl 

961/321 991/331 

ggg cag agg acc cgc egg gee ccc acc gee gag gag ggg tgg cca ccg gcg gec ccg gac 
G QRIRRAFIAEECWLLVALD 
1021/341 1051/3S1 

tat age cag ata gag ccc agg gcg ccg gee cac ccc tec ggc gac gag aac ctg acc egg 
YSQ1ELRVLAHLSC.OENLIR 
1081/361 1111/371 

gtc ttc cag gag ggg egg gac ate cac acg gag acc gee age tgg acg ccc ggc gee cce 
VFQECRDI HTETASWHFCVP 
1141/381 1171/391 

egg gag gee gtg gac ccc ctg acg cgc egg gcg gee aag acc ate aac cac ggg gee ccc 
REAVDPl. MRRAAKTINYGVL 
1201/401 1231/411 

tac ggc atg ccg gee cac cgc ccc tec cag gag eta gee acc cct tac gag "gag gee cag 
YCNSAIIKLSOELAJ P Y E E A 0 
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1261/421 1291/431 

gcc ttc act gag cgc tac CCC cag age etc ccc aag gtg egg gee tgg att gag aag ace 
AFIERYFQSFPKVRAWIEKT 
1321/441 13S1/451 

atg gag gag ggc agg agg egg ggg tac gtg gag ace ccc ttc ggc cgc cgc cgc tac gtg 
•l|EEGRRRGYVETLFGRRRYV 
P81/461 1411/471 

cca gac eta gag gcc egg gtg aag age gcg egg gag gcg gcc gag cgc acg gcc ttc aac 
PDLEARVKSVREAAERMAFN 
K41/481 1471/491 

atg ccc gtc cag ggc acc gcc gcc gac etc atg aag ctg get atg gtg aag etc ttc ccc 
MPVQGTAADLMKLAMVKLFP 
1501/501 1531/511 

agg ctg gag gaa atg ggg gcc agg atg etc ctt cag gtc cac gac gag ctg gcc ccc gag 

RLEEMOARMLLQVHDELVLE 
1561/521 1591/531 

gcc cca aaa gag agg gcg gag gcc gcg gcc egg ccg gcc aag gag gcc acg gag ggg gtg 
APKERAEAVARLAKEVMEGV 
1«1/S41 1651/551 

tat ccc ctg gcc gcg ccc cCg gag gcg gag gcg ggg ata ggg gag gac tgg etc Ccc gcc 

yPLAVPLEVEVGIGEDWLSA 
1681/561 

aag gag tga 

K B 



FIG. 2 

(sheet 
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DMA sequence 2496 b.p. atggcgatgctt ... gccaaggagtag linear 

I 

1/1 31/11 

atg gcg atg etc ccc etc ttt gag ccc aaa ggc cgc gcg etc ctg gtg gac ggc cac cac 
MAM LP L F £ P KGRVL'LV DG H H 
(1/21 91/31 

ctg gec tac cgc ace ttc ttt gec etc aag ggc etc ace acc age cgc ggc gaa ccc gtt 
LAYRTFF ALKGLTTSRGEPV 
121/41 151/Sl 

cag gcg gtc tac ggc ttc gec aaa age etc etc aag gec ctg aag gag gac ggg gac gtg 
QAVYG FAKSLLKALK BDGDV 
181/61 211/71 

gtg gtg gtg gtc ttt gac gee aag gec ccc tec ttc cgc cac gag gec tac gag gec tac 
V VVVF DAK A P S FRH EA Y E A Y 
241/81 271/91 

aag gcg ggc egg gec ccc- acc ccg gag gac ttt ccc egg cag ctg gee etc ate aag gag 
KAORAPTP EDFPRQLALXKE 
301/101 331/111 

ttg gtg gac etc eta ggc ctt gtg egg ctg gag gtt ccc ggc ttt gag gcg gac gac gcg - 
LVDLLG LV R L EV P G F EADDV 
361/121 391/131 

ctg gec acc ctg gee aag egg gcg gaa aag gag ggg tac gag gtg cgc ate etc act gee 
LATLAKRAEK EGYEVR ILTA 
421/141 451/lSl 

gac cgc gac etc tac cag etc ctt teg gag cgc ate gec ate etc cac cct gag ggg tac 
DRDLYQLLSERIAILHPEGY 
481/161 511/171 

ctg ate acc ccg gcg tgg ctt tac gag aag tac ggc ctg cgc ccg gag cag egg gcg gac 
L IT PAW L Y EX YGLR p.EQWV D 
541/181 571/191 

tac egg gec ctg gcg ggg gac ccc teg gat aac ate ccc ggg gtg aag ggc acc ggg gag . 
YRALAGDPSDN IPCVKG1GE 
601/201 631/211 

aag acc gec cag agg etc ate cgc gag tgg* ggg age ctg gaa aac etc ttc cag cac ctg 
KTAQRLI REWGSLEHLFQHL 
661/221 691/231 

gac cag gtg aag ccc tec ttg egg gag aag etc cag gcg ggc atg gag gec ccg gec cct 
DQVKPSLR EK LQAGM EALAL 
721/241 7S1/251 

tec egg aag ctt tec cag gtg cac act gac ctg ccc ctg gag gtg gac ttc ggg agg cgc 
SRKLSQVHTDLPLEV U-F'GRR 
781/261 811/271 

cgc aca ccc aac ctg gag ggt ccg egg gec ccc ccg gag egg ccg gag ccc gga age ccc 
RTPNLEGLRA FLERLEFGSL 
841/281 871/291 

ccc cac gag ccc ggc ccc ccg gag ggg ccg aag gcg gca gag gag gec ccc egg ccc ccc 
LI1EFCLLEGPKAAEEAPWP P 
901/301 931/311 

ccg gaa ggg gcC Ctt ttg ggc ttt tec ttt tec cgc ccc gag ccc acg egg gec gag ccc 
PEGAFLGFSFSRPEPMWAEU 
961/321 991/331 

ccg gec ccg get ggg gcg tgg gag ggg cgc ccc cac egg gca caa gac ccc ccc agg ggc 
LALAGAWECR LHRAQDPLRC 
1021/341 10S1/351 

ccg agg gac ccc aag ggg gcg egg gga acc ccg gec aag gac ccg gcg gtc Ctg gec ctg 
LRDLKGVRGI LAKOLAVLAL 
1081/361 1111/371 

egg gag ggc ctg gac etc ttc cca gag gac gac ccc atg etc ctg gee cac ccc ccg gac 
REGLDLFPEOOPMLLAYLLO 
1141/381 1171/391 

ccc ccc aac acc acc ccc gag ggg gtg gec egg cgC cac ggg ggg gag egg acg gag gac 
PSNTTP EGVA RR YGG EWTEO 
1201/401 1231/411 

gcg ggg gag agg gec ccc ccg gee gag cgc ccc ttc cag acc cca a3g gag Cgc ctt aag 
AG ERA L LA E. R L F Q T L K E R I. K 
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1261/421 1291/431 

gga gaa gaa cgc ccg etc egg cct tac gag gag gtg gag aag ccg etc tec egg gtg teg 
CEERLLWLYEEVEKPLSRVL 
1321/441 1351/451 

gee egg acg gag gee a eg ggg gtc egg ctg gae gtg gee tac etc cag gee etc tec ctg 
A . R M E A T G V R LDVAY L Q A LS L 
,1381/461 1411/471 

gag gtg gag gcg gag gtg cgc cag ctg gag gag gag gtc ttc cgc ctg gec ggc cac ccc 
EVEAEVRQLEEEVFRLACHP 
1441/481 1471/491 

ttc aac etc aac tec cgc gac cag ctg gag egg gtg etc ttt gac gag ccg ggc ctg cct 
FN LH6RDQL ERVLFDELG LP 
1501/501 1531/511 

gee ate ggc aag acg gag aag acg ggg aaa cgc tec ace age get gee gtg ctg gag gee 
AIGKTEKTC K.RSTSAAVL E A 
1561/521 1591/531 

ctg cga gag gee cac ccc ate gtg gac cgc ate ctg cag tac egg gag etc acc aag etc 

LREAHPIVDRILQYREL.TKL 
1621/S41 1651/551 

aag aac acc tac ata gac ccc ctg ccc gee ctg gtc cac ccc aag acc ggc egg etc cac 
KNTYIDPLPALVHPKTGRLH 
1681/561 1711/571 

acc cgc ttc aac cag acg gee acc gec acg ggc agg ctt tec age tec gac ccc aac ccg 
TRFNQTATATGR LSSSDPNL 
1741/581 1771/S91 

cag aac ate ccc gcg cgc acc cct ctg ggc cag cgc acc cgc cga gec ttc gcg gee gag 
Q.N.I PVRTPLCQRIR RAFVAE 
1801/601 1831/611 

gag ggc Cgg gtg ctg gcg gtc ttg gac tac age cag acc gag ccc egg gee ccg gec cac 
EGWVLVVLDYSQIELRVLAH 
1861/621 1891/631 

etc tec ggg gac gag aac ctg acc egg gee tec cag gag ggg agg gac aec cac acc cag 
LSGDENLIRV FQECROIIITQ 
1921/641 19S1/6S1 

acc gee age tgg acg ttc ggc gtt tec ccc gaa ggg gta gac cct ctg acg cgc egg gcg 
TASWMFGVSPEGVDPLMRRA 
1981/661 2011/671 

gec aag acc ate aac ttc ggg gcg ccc cac ggc acg ccc gec cac cgc ccc ccc ggg gag 
AKTINFCVLYGMSAIIRLSGE 
2041/681 2071/691 

ccc Ccc acc ccc cac gag gag gcg gtg gec CCc ate gag cgc cac ccc cag age tac ccc 
LSIPYEEAVAFIERYFQSYP 
2101/701 2131/711 

aag gcg egg gec tgg act gag ggg acc ccc gag gag ggc cgc egg egg ggg CaC gcg gag 

KVRAWIEGTLEECRRRGYVE 
2161/721 2191/731 

acc ccc CCc ggc cgc egg cgc tat gtg ccc gac etc aac gee egg gcg aag age gcg cgc 
TL FGR R RYV P DLNA R V K S V R 
2221/741 2251/751 

gag gcg gcg gag cgc atg gec tec aac acg ccg gec cag ggc acc gee gee gac etc acg 
EAAERMAFNM PVQGTAADLM 
2281/761 2311/771 

aag ccg gee acg gcg egg ccc ttc ccc egg ctt cag gaa ctg ggg gcg agg atg ctt teg 
KLA.MVR L F P R LQELGA R M L L 
2341/781 2371/791 

cag gcg cac gac gag ccg gtc ccc gag gee ccc aag gac egg gcg gag agg gta gec gec 
QVHDELVLEAPKDRAERVAA 
2401/801 2431/811 

CCg gee aag gag gec acg gag ggg gtc egg ccc ccg cag gcg ccc ctg gag gtg gag gcg 
LAKEVMEGVW PLQV P L E V E V 
2461/821 2491/831 
ggc ccg ggg gag gac Cgg etc Ccc gee aag gag Cag 
GLGEDW LSAK E • 
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FIG. 4 
(sheet 1) 



jfk sequence 2505 b.p. ATCCAGCCCATG ... CCCAAGGGTTAG linear 

P 

coding sequence of T. thenoophilus DMA polymerase as submitted by O. Celfond in WO 91/09950 PCT/UStO/07i 



l/l 31/11 

VPG GAO GOG ATO CTT CCC CIC TTT CAA CCC AAA GCC COG GTC CTC CTG GTQ QAC GOC CAC 
4EAMLPLFEPK0RVLLVDGH 
61/21 91/31 

ZAC CTC CCC TAC CCC ACC TTC TIC CCC CTC AAC CCC CIC ACC ACQ AOC COO. OCC QAA CCC . 
HLAYRTFFALKOLTTSRCEP 
121/41 151/51 

CTC CAG OCC CTC TAC CCC TIC CCC AAG AOC CTC CTC AAC OCC CTC AAC CAC CAC CCC TAC 
VQAVYGFAKSLLKALKEDCY 
181/61 211/71 

AAC OCC GTC TTC CTG GTC TTT GAG GCC AAG OCC CCC TCC TIC CCC CAC GAG GOC TAC CAG 
KA VFVVFDAKAPSF RHE AY E 
241/81 271/91 

CCC TAC AAC COG GOG AGO GCC COG ACC CCC CAG GAC TIC CCC COG CAG CTC OCC CTC ATC 
AYKAGRAPTPEDFPRQLAL I 
301/101 331/111 

AAG GAG CTG CTG GAC CTC CTG GOG TTT ACC CCC CIC GAG GTC CCC GCC TAC GAG CCC GAC 
KELVDLL GFTRLEVPGYE AD 
361/121 391/131 

GAC CTT CTC OCC ACC CTG GCC AAG AAG GCC GAA AAC GAG CCC TAC CAG GTC CCC ATC CTC 
DVLATLAKKAEKEC YEVRI L 
421/141 451/151 

ACC OCC GAC CCC GAC CTC TAC CAA CTC GTC TCC GAC CCC CTC OCC CTC CTC CAC CCC GAG 
TADRDLYQLVSDRVAV L1IPE 
481/161 Sll/171 

CCC CAC CTC ATC ACC CCG GAG TOG CTT TOG CAC AAG TAC GOC CTC AGO CCC CAG CAG TCC 
GHLITPE WLWEKYCL RP E QW 
541/181 571/191 

CTC GAC TTC CCC GCC CTC GTC GCC CAC CCC TCC GAC AAC CIC CCC GCC GTC AAG CCC ATC 
V D F R A L VC D P S D N L P CV KC I 
601/201 631/211 

CCC GAG AAC ACC CCC CTC AAG CTC CTC AAC GAG TCC CCA ACC CTG GAA AAC CTC CIC AAG 
CEKTALKLLKEWGSLENLLK 
661/221 691/231 

AAC CTC GAC COG CTA AAG CCA GAA AAC GTC COG GAC AAG A1C AAC CCC CAC CTC CAA GAC 
NLORVK PEHVREKIKAHLEO 
721/241 7S1/2S1 

CTC AGO CTC TCC TTC CAG CTC TCC COG GTC CCC ACC CAC CIC CCC CTC GAG GTC GAC CTC 
LRLSLELSRVRT.DLPLE VOL 
781/261 811/271 

GCC CAG CCG CCC GAG CCC CAC CCC CAG COG CTT AGG GCC TIC CIC GAG ACC CTC GAG TTC 
AQCREPDREGLRAFLERLEF 
841/281 871/291 

GCC AOC CTC CTC CAC GAG TIC GCC CTC CTC GAG GCC CCC GCC CCC CTC GAG CAC GCC CCC 
CSLLHEFGLLEAPAPLEEAP 
901/301 931/311 

TCC CCC CCC CCG GAA CCC CCC TTC CTG GOC TTC GTC CTC TCC CCC CCC CAG CCC ATC TCG 
WPPPECAFVGFVLSRPEPHW 
961/321 991/331 

GCC GAG CTT AAA GCC CTC GCC CCC TCC AGG GAC GGC CCG GTC CAC CCC CCA CCA GAC CCC 
A ELKA LAAC R DC RV|| R ' A A D P 
1021/341 1051/351 

TTC CCG GCC CTA AAG GAC CTC AAG GAG CTC COG GGC CTC CTC GCC AAG GAC CTC GCC GTC 
OAGLKDLKEVRCLLAKDLAV 
1081/361 1111/371 

TTC GCC TCC AGG GAG GGC CTA CAC CTC CTC CCC GGC CAC CAC CCC A1C CTC CIC CCC TAC 
LASRECLOLV PG00PMLLAY 
1141/381 1171/391 

CTC CTG CAC CCC TCC AAC ACC ACC CCC GAG GCC GTC CCG CCC CCC TAC GGC GGC CAG ICC 
LLDPS NTT P ECVARR YCCEW 
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1201/401 1231/411 

ACQ GAG GAC GCC GCC CAC CGG GCC CTC CTC TCC GAG ACQ C1C CAT COG AAC CTC CTT AAC 
TEDAAllRALLSERLIIRNLLK 
4261/421 1291/431 

CGC CTC GAG GOG GAG GAG AAC CTC CTT TOG CTC TAC CAC GAG GTG GAA AAC CCC CTC TCC 
. R L EGEEKLLWLYHEV E K PLS 
1321/441 13S1/4S1 

COG GTC CTC GCC CAC ATG GAG GCC ACC GOG GTA CGG CTG CAC CTC CCC TAC CTT CAC CCC 
RV LA II MEATGVRLDVAYLQA 
1381/461 1411/471 

CTT TCC CTG GAC CTT GCC GAG GAG ATC CGC CCC CTC GAG GAC CAC CTC TIC CCC TTG GOG 
LSLELAEEI RRLEEEV-FRLA 
1441/481 1471/491 

GCC CAC CCC TTC AAC CTC AAC TCC CGG CAC CAC CTC GAA ACC GTG CTC TTT GAC GAG CTT 
GUPFNLNSRDQLBRVLFDEL 
1501/501 1531/511 

AGO CTT CCC GCC TTG GCC AAC ACC CAA AAC ACA OGC AAC CCC TCC ACC ACC GCC GCC GTC 
R L PALGKTQKTCKRSTSAAV 
1561/521 1S91/531 

CTG GAG GCC CTA CGG GAG GCC CAC CCC ATC GTG GAG AAC ATC CTC CAG CAC CGG GAG CTC 
LEALR EAII P IVEKZLQHR EL 
1621/541 1651/S51 

ACC AAG CTC AAG AAC ACC TAC CTG GAC CCC CTC CCA ACC CTC GTC CAC CCG AGG ACC CGC 
TK LKNTYVO PLPSLV HPRTC 
1681/561 1711/571 

CGC CTC CAC ACC CGC TTC AAC CAG ACC GCC ACC GCC ACC CGG AGG CTT ACT AGC TCC GAC 
RLHTR FNQTATATCR LSSS D 
1741/581 1771/591 

CCC AAC CTG CAG AAC ATC CCC CTC CGC ACC CCC TTC CCC CAC AGC ATC CCC CCG CCC TTC 
PNLQH I PVRTPLCQR IRRAF 
1801/601 1631/611 

GTG GCC GAG GOG GOT TOG CCC TTG CTG GCC ClC GAC TAT AGC CAG ATA GAG CTC CGC GTC 
VAEAC WALVALDYSQIELRV 
1861/621 1891/631 

CTC GCC CAC CTC TCC CCC GAC GAA AAC CTG ATC AGG GTC TTC CAG GAG GOG AAC CAC ATC 
LAHLSGDENLIRVFQECKDl 
1921/641 19S1/651 

CAC ACC CAG ACC GCA AGC TOG ATG TTC CGC CTC CCC CCG CAC CCC CTC CAc CCC CTC ATG 
H T Q T A S W M F C V P P E A V D P L M 
19B1/661 2011/671 

CCC COG GCC GCC AAG ACC CTG AAC TTC CCC CTC C1C TAC CGC ATG TCC GCC CAT AGG CPC 
RRAAK TVNFGVLYGMSAIIRL 
2041/681 2071/691 

TCC CAG GAG CTT GCC ATC CCC TAC GAC GAG GCC CTG GCC TTT ATA CAC CGC TAC TIC CAA 
SQ E LA IPYEEAVAFI ERYFQ 
2101/701 2131/711 

AGC TTC CCC AAG GTG CGG GCC TGG ATA GAA AAG ACC CTG CAG GAC GCC AGG AAG CGG CGC 
SFPKVRAWI EKTLEECRKRC 
2161/721 2191/731 

TAC GTC GAA ACC CTC TTC GCA ACA AGC CCC TAC CTG CCC CAC CTC AAC CCC CCC GTG AAG 
YVETLFGR RRYVPDLNARVK 
2221/741 2251/751 

AGC GTC AGG GAC CCC GCC CAG CCC ATG CCC TTC AAC ATG CCC CTC CAC GCC ACC GCC CCC 
SVREAAERMAFNMPVQGTAA 
2281/761 2311/771 

GAC C1C ATC AAG CTC GCC ATC CTC AAG CTC TIC CCC CCC CTC CGC CAG ATG GCC CCC CCC 
OLMKLA HVK LFPRLREMGAR 
2341/781 2371/791 

ATG CTC CTC CAG GTC CAC GAC GAG CTC CTC CTC CAG GCC CCC CAA GCC CGG GCC GAG GAG 
MLLQV HOBLLLEAP QARAEE 
2401/801 2431/811 

GTG GOG GOT TIC GCC AAG GAG GCC ATC CAG AAC CCC TAT CCC CTC CCC CTC CCC CTC CAG 

VAALAKEAMEKAYPLAVPLE 

2461/821 2491/831 

GTG GAG GTG GGG ATC GGG CAC GAC TCC CTT TCC GCC AAC OCT TAC 

VEVGMGEOWLSAKG 



FIG. A 

(sheet 2) 
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R D G R 



G L 



FIG. 5 

(Sheet 1) 

DNA and protein sequence of the coding region of pMR8, encoding FY 4 
1/1 31/11 

ATG CTG GAA CGT CTG GAA TTC GGC AGC CTC CTC CAC GAG TTC GGC CTC CTG GAG GCC CCC 

MLE R LE FGSLLHEFGLLEAP 
61/21 91/31 

GCC CCC CTG GAG GAG GCC CCC TGG CCC CCG CCG GAA GGG GCC TTC GTG GGC TTC GTC CTC 

APLEE APWPPPEGAFVGFVL 
121/41 151/51 

TCC CGC CCC GAG CCC ATG TGG GCG GAG CTT AAA GCC CTG GCC GCC TGC AGG GAC GGC CGG 
SR PEPMWAELKALAAC 
181/61 211/71 
GTG CAC CGG GCA GCA GAC CCC TTG GCG GGG CTA AAG GAC CTC AAG GAG GTC CGG GGC CTC 

VH RAADPLAGLKDLKEVR 
241/81 271/91 

CTC GCC AAG GAC CTC GCC GTC TTG GCC TCG AGG GAG GGG CTA GAC CTC GTG CCC GGG GAC 

kAKDLAVLASREG LDLVPGD 
301/101 331/111 

GAC CCC ATG CTC CTC GCC TAC CTC CTG GAC CCC TCC AAC ACC ACC CCC GAG GGG GTG GCG 

DPMLLAYLLDP SNTTPEGVA 
361/121 391/131 

CGG CGC TAC GGG GGG GAG TGG ACG GAG GAC GCC GCC CAC CGG GCC CTC CTC TCG GAG AGG 

RRY <5GEWTEDAAHRALLSER 
421/141 451/151 

CTC CAT CGG AAC CTC CTT AAG CGC CTC GAG GGG GAG GAG AAG CTC CTT TGG CTC TAC CAC 
L HR NLLKRLEGEEKL 
481/161 511/171 
GAG GTG GAA AAG CCC CTC TCC CGG GTC CTG GCC CAC ATG GAG GCC ACC GGG GTA CGG CTG 

EVE *PLSRVLAHMEATGVRL 
541/181 571/191 

GAC GTG GCC TAC CTT CAG GCC CTT TCC CTG GAG CTT GCG GAG GAG ATC CGC CGC CTC GAG 

DVAYLQALSLELAEEIRRLE 
601/201 631/211 

GAG GAG GTC TTC CGC TTG GCG GGC CAC CCC TTC AAC CTC AAC TCC CGG GAC CAG CTG GAA 

EEVFR LAGHPFNLN SRDQLE 
661/221 691/231 

AGG GTG CTC TTT GAC GAG CTT AGG CTT CCC GCC TTG GGG AAG ACG CAA AAG AC A GGC AAG 

RVLFDELRL PALGKTQKTG K 
721/241 751/251 

CGC TCC ACC AGC GCC GCG GTG CTG GAG GCC CTA CGG GAG GCC CAC CCC ATC GTG GAG AAG 
RSTSAAVLEALREAH PIVEK 
781/261 811/271 

ATC CTC CAG CAC CGG GAG CTC ACC AAG CTC AAG AAC ACC TAC GTG GAC CCC CTC CCA AGC 



L W L . Y H 



D P L P S 



1 LQH RELTK LKNT 
841/281 871/291 
CTC GTC CAC CCG AGG ACG GGC CGC CTC CAC ACC CGC TTC AAC CAG ACG GCC ACG GCC ACG 

LVHPRTGRLHTRFNQTATAT 
901/301 931/311 

GGG AGG CTT AGT AGC TCC GAC CCC AAC CTG CAG AAC ATC CCC GTC CGC ACC CCC TTG GGC 
GRLSSSDPNLQNI PVRTPLG 
961/321 991/331 

CAG AGG ATC CGC CGG GCC TTC GTG GCC GAG GCG GGT TGG GCG TTG GTG GCC CTG GAC TAT 

QRIRRAFVAEAGWA. LVALDY 
1021/341 1051/351 
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AGC CAG ATA GAG CTC CGC GTC CTC GCC CAC 

SQIELRVLAH 

1081/361 

TTC CAG GAG GGG AAG GAC ATC CAC ACC CAG 

FQEGKDIHTQ 

1141/381 

GAG GCC GTG GAC CCC CTG ATG CGC CGG GCG 

EAVDPLMRRA 

1201/401 

GGC ATG TCC GCC CAT AGG CTC TCC CAG GAG 

GMSAHRLSQE 

1261/421 

TTT ATA GAG CGC TAC TTC CAA AGC TTC CCC 
FIERY FQSFP 
1321/441 

GAG GAG GGG AGG AAG CGG GGC TAC GTG GAA 

EEGRKRGYVE 

1381/461 

GAC CTC AAC GCC CGG GTG AAG AGC GTC AGG 

DLNARVKSVR 

1441/481 

CCC GTC CAG GGC ACC GCC GCC GAC CTC ATG 
PVQGT AADLM 
1501/501 

CTC CGG GAG ATG GGG GCC CGC ATG CTC CTC 

LREMGARMLL 

1561/521 

CCC CAA GCG CGG GCC GAG GAG GTG GCG GCT 

PQARAEEVAA 

1621/541 

CCC CTC GCC GTG CCC CTG GAG GTG GAG GTG 
PLA VPLEVEV 
1681/561 
GGT TAG 
G * 



CTC TCC GGG GAC GAA AAC CTG ATC AGG GTC 

LSGDENLIRV 

1111/371 

acc gca agc tgg atg ttc ggc gtc ccc cog 
tasw'mfgvpp 

11*71/391 

GCC AAG ACG GTG AAC TAC GGC GTC CTC TAC 
AKTVNYGV LY 
1231/411 

CTA GCC ATC CCC TAC GAA GAA GCG GTG GCC 

LAIPYEEAVA 

1291/431 

AAG GTG CGG GCC TGG ATA GAA AAG ACC CTG 
KVRAWI E KTL 
1351/451 

ACC CTC TTC GGA AGA AGG CGC TAC GTG CCC 
TLFGR RRYVP 
1411/471 

GAG GCC GCG GAG CGC ATG GCC TTC AAC ATG 

EAAERMAFNM 

1471/491 

AAG CTC GCC ATG GTG AAG CTC TTC CCC CGC 

KLAMVKLFPR 

1531/511 

CAG GTC CAC GAC GAG CTC CTC CTG GAG GCC 
QVHDEL L LEA 
1591/531 

TTG GCC AAG GAG GCC ATG GAG AAG GCC TAT 

LAKEAMEKAY 

16S1/551 

GGG ATG GGG GAG GAC TGG CTT TCC GCC AAG 
GMGEDWLSAK 
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